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Abstract 


Parallel reversal schedules describe how to calculate the states of an evolutionary system, 
such as atmospheric and oceanographic simulations, in reverse order without having to 
keep all states in memory. This is possible without any increase in computation time, 
by recalculating intermediate results using multiple processors on a parallel computer 
system. These schedules are not only applied physical simulations that need to run in 
reverse, but also in algorithmic differentiation, which in turn is used, among others, in 
nonlinear optimization and to solve partial differential equations. Earlier research led to 
optimal schedules under the central assumption that if k states can be kept in memory, 
there are enough processors to run computations on roughly half of them in parallel, 
while the other half can only be used for holding checkpoints. 

This diploma thesis is an attempt to continue the research by relaxing the central 
assumption, such that memory for a large number of plain checkpoints can be used with 
a comparatively small number of processors. To cope with this challenge, a symbolic 
approach to reversal schedules is proposed, and a comprehensive algebra is developed 
to analyze schedules via their profiles. This algebra is very generic and could have 
applications outside parallel reversal schedules. Using that instrument, new optimal 
schedules are developed, to be applied in situations where the earlier schedules are not 
applicable. Moreover, suboptimal schedules are provided where optimal schedules could 
not be found in a systematic way. 


3 



Acknowledgments 


First of all, I would like to thank Prof. Andreas Griewank for providing the opportunity 
to write my thesis on this exciting topic, for his patience, for guiding me through the 
research and writing phases, and most of all, for accepting me as his last diploma degree 
candidate before leaving the Humboldt University of Berlin. 

I would also like to thank Prof. Andrea Walther not just for her solid dissertation on 
which this thesis is based, but also for her advice and support whenever I needed it. 

Moreover, I appreciate the advice given by Dr. Louchka Popova-Zeugnrann on my 
Petri net drafts. 

I am particularly grateful for the continuous motivation given by Berit Grufiien, who 
also read drafts and gave valuable hints to bring this thesis into shape. 

Finally, and most importantly, I am deeply indebted to my wife Antje for her unfailing 
support during this intensive phase of my life. 


4 



Contents 


Abstract 3 

Acknowledgments 4 

1. Introduction 7 

1.1. Reversal of Evolutionary Systems 7 

1.2. Parallel Reversal Schedules 9 

1.3. Relaxing Processor-Checkpoint Convertibility 11 

1.4. Scope and Structure of this Thesis 12 

1.5. Notations and Conventions 13 

2. Algebra on Schedules 14 

2.1. Generic Parallel Schedules 14 

2.2. Parallel Reversal Schedules 19 

2.3. Decomposition and Composition 22 

3. Algebra on Profiles 30 

3.1. Profile Space 30 

3.2. Shift, Duration and Final Value 35 

3.3. Complete Profiles 37 

3.4. Arrangement 42 

3.5. Algebra Summary 46 

3.6. Schedule Profiles 48 

4. Optimal and Suboptimal Schedules 56 

4.1. Fibonacci Schedules 56 

4.2. Exhaustive Search 58 

4.3. One Process 59 

4.4. Two Processes 60 

4.5. Three Processes 62 

4.6. Four or more Processes 64 

4.7. Summary 67 

5. Conclusion 68 

A. Tool for Generating Schedules 69 

B. Time Split in Schedule Algebra 71 


5 



73 


C. Polynomials as Profiles 

D. Notable Search Result for p=4, k=8 76 

List of Figures 80 

Bibliography 81 

Statements 83 


6 



1. Introduction 


1.1. Reversal of Evolutionary Systems 

Evolutionary systems are mathematical models that can be described as a sequence of 
states 


SO) s 1 ,s 2 ,...,sg 

where each state can be calculated directly from the previous state. That is, there are 
functions F\, F 2 , . . . such that: 


•si = El (s 0 ) 
s 2 = F 2 (si) 


sg = Fe(sg-i) 

A typical example are physical simulations, such as atmospheric and oceanographic 
simulations [GW08, p. 278]. Here, each state represents the model at a certain point 
in time, and each function F t calculates the simulation of one time step. To prevent 
confusion of the simulated time with the computation time, we introduce the term 
physical step to always refer to the simulated time, not the computation time. More 
precisely, we say that Fj calculates (or simulates) the physical step i by mapping the 
previous state Sj_i to the state s t . 

Fi F2 F( 

So > Si — > ■ ■ ■ > Sg 

Their composition F then describes the whole model: 

F = Ff O ■ ■ ■ o F\ 

F 

So — > sg 

Written like that, evolutionary systems are a special case of mathematical models, but 
also describe the general case. 

1. They are a special case in the sense that each state s* depends only on Sj_i, while 
we would normally expect it to depend on all preceding states so, si, • • • , Sj-i- 

2. On the other hand, we could always define every state s l to contain all intermediate 
states calculated so far, in which case every calculation could be represented as an 
evolutionary system. 
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To emphasize that we mean the first kind of models and not the second one, we demand 
that all states s,; have roughly the same size. More precisely, we demand that all s t can 
be represented within the same amount of computer memory mem(sj), and that we have 
working implementations of all Fj which operate on those representions. 

rnem(so) = mem(si) = • • • = mem(s^) 

In this thesis, we are concerned with the following question: If we are given an evolu- 
tionary system with t physical steps and the initial state sq, how do we compute the 
states in reverse order? 

( ? \ ? ? ? ? 

■So A t > S //-1 SI -7- So 

This question arises in several situations. For example, the evolutionary system may 
be a physical simulation displayed to the user, and once the user has seen all states 
so, ... , S£, they want to see the simulation in reverse. 

Another application is algorithmic differentiation , which is a technique to calculate 
derivatives, with applications in nonlinear optimization, solving partial differential equa- 
tions and others. In particular, we are interested in the reverse mode of algorithmic 
differentiation, which calculates adjoint derivatives such as gradients. The reverse mode 
is not restricted to evolutionary systems, but those allow for a more memory-efficient 
treatment. 1 It requires, however, that the reverse mode has access to all states s t in 
reverse order. 

So how exactly do we perform a reverse computation of an evolutionary system? One 
simple approach is: 


Ff>o---oFi 
So > 


S£ 


F(~ 


s t-l 




F 2 -\ Tl-\ 
> Si > So 


That is, we first calculate S£ from so by applying F\, . . . , Fj>. Then, one by one, we 
apply F(F^, , . . , Fi -1 , which yield the states in reverse order. However, this is only 
possible if all functions Fj are invertible and we have working implementations of all 
Fj: -1 . Therefore, this approach is not feasible for most models. The exact reasons vary 
from case to case, but are often a combination of: 

1. The Fj are not injective and it is hard or impossible to choose the right preimage 
Sj_i £ Fj -1 (si) when a state s* is given. 2 

2. Computing all Fj -1 is orders of magnitude more expensive than computing all Fj. 

3. Inversion of the Fj is mathematically possible, but numerically ill-conditioned. 
This happens for heat equations and other diffuse processes. 

For those cases, where we do not have working implementations of the Fj -1 , reversal 
schedules come into play. 

1 This approach has been used successfully, for example, in seismic research. [Sym07] 

2 In some cases we can deal with this by storing additional hints in the result state, but in general this 
results in accumulating hints from previous states, making the states so, si, s 2 , ■ ■ ■ growing from step 
to step, which is not what we want. 



1.2. Parallel Reversal Schedules 


Suppose we want to reverse an evolutionary system of t physical steps 


so 


Fi, 


Ft 


Si 


where we do not have working implementations of the p _1 . How do we achieve this? 

One possibility is to simply save all states Si as they are computed. This total recall 
strategy is depicted in Figure 1. The x-axis indicates the computation time, while the 
y-axis indicates the states s t . Note that throughout this thesis, we assume that all Fi 
require roughly the same computation time, and define one time unit to be exactly that 
time. 3 


time(-Fi) = time(p 2 ) = • • • = time(i^) = 1 

The dots denote which states are available in computer memory at what point in time. 
The lines which connect the dots denote various actions. 

The simplest action is Checkpoint holding (C). denoted by a solid horizontal line (— •), 
which means that the same state s t is still available after one time unit has passed. This 
is often implemented by checkpointing, that is, copying the current computing state to 
a separate memory location. That checkpoint is then held in memory until it is restored 
at a later point in time. 

Another action is forward (F) . denoted by a solid diagonal line (/*), which means that 
F l+ i is being applied to state s t , so after one time unit state Sj+i is available. 

This would be all we need. But for algorithmic differentiation, two more actions 
are needed and scheduled explicitly. These are depicted as dashed lines in Figure 1. 
To understand those actions, it is necessary to know that the reverse mode does not 
only need the states Si in reverse order. It needs all intermediate results that appear 
during evaluation of all physical steps p, in reverse order. We meet this requirement 
by replacing the last F action with a prepare action (P), depicted as a dashed forward 


3 Note that [Wal99] also considered varying time(p), but only for sequential schedules, not for parallel 
schedules. 



Figure 1.: Total recall schedules for the reversal of t = 4 physical steps. The middle 
schedule incorporate the prepare and reverse actions explicitly, as needed for 
algorithmic differentiation. The right schedule shows the plain reverse mode 
that contains only prepare and reverse actions without any checkpointing. 
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diagonal line This also evaluates F l+ \ , but stores all intermediate values into a 

separate stack in memory. This is also denoted as iq+i . 

Every P action is followed by a reverse action ( R ), depicted as a dashed backward 
diagonal line (\). This starts or continues the actual reversal, retrieving all intermediate 
values in reverse order from the prepared stack. This is also denoted as F l+ \ . Through- 
out this thesis we assume that P and R actions have roughly the same computation 
complexity as F actions, that is: 4 

tim e(-Fj) = time(Ej) = time(-Fj) = 1 

Figure 1 also demonstrates that during every R action, there is also the P action running 
for the previous physical step, except for the last R action. So for i > 1 this schedule 
requires at least two processes running in parallel. That is, it requires at least either 
a two-core processor or a cluster of two computers. Moreover, up to three checkpoints 
must be held in memory simultaneously. The total number of C actions is l{l — 1), the 
total number of F actions is £ — 1. 

Note that this schedule runs in optimal computation time. Under the given precondi- 
tions, we can’t do any better than an uninterrupted forward calculation followed by an 
uninterrupted reverse calculation. In this thesis we consider only that kind of schedules, 
also called feasible parallel reversal schedules [Wal99, p. 58], which means that we have 
always Im = 2U where tM is the total computation time according to the schedule. 5 

While the total recall strategy is wasteful with memory, it is still a major improvement 
over the plain reverse mode with regard to memory usage. As shown in Figure 1, the plain 
reverse mode would create a huge stack of all intermediate values during the evaluation 
of all physical steps F \ . . , Ff : . This is a lot more than the memory required to store 

just all states so, ... ,S£. 

But why should care about memory at all? The obvious reason is of course economics. 
There is a cost limit in attaching more RAM and hard drives to a computer system. The 
less obvious reason is speed, because the reverse mode is memory bound. Going from 
L3 cache to RAM already incurs a considerable penalty, even more so when going from 
RAM to solid-state drives and from there to hard drives. Since the memory is accessed 
sequentially, the access pattern is already optimal. The various caching systems won’t 
provide much speedup here, as their main task is to transform random access patterns 
into sequential access patterns. In the end, the slowest memory in the chain will be 
the bottleneck. If “wasting” lots of CPU cycles through recalculation leads to avoiding 
the slow memory, this is a good tradeoff, as we are using processing resources that 
would otherwise waste their time waiting for memory components to respond. It was 
shown that this speedup is measurable not just on parallel systems, but also on a single 
computer with a single-core CPU. [GW08, pp. 293-294] 

The oppsite of the total recall strategy is the total recalculation strategy, which is 


4 Note that [Wal99] also considered scenarios where the time of the P and R actions are whole-number 
multiples of the time of F actions. 

5 The exact formula in [Wal99, p. 58] is tM = {£ — 1) + t + it, but our previous assumptions translate 
to t — t — t — 1, so this formula simplifies to £m = 2 i. 
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depicted in Figure 2. Here, every state s t is recalculated from the base so- Apart from 
the obligatory P and R actions, there are only F actions in this schedule. C actions are 
only used to keep so available. 6 In this schedule, up to tree processes run in parallel, 
while at most one checkpoint needs to be held in memory. The total number of C actions 
is 2£ — 2, the total number of F actions is . 

Of course, there is a large degree of freedom between total recall and total recalculation 
schedules. 


1.3. Relaxing Processor-Checkpoint Convertibility 

In this thesis we are concerned with the following questions: Depending on the number 
of processes that can truly run in parallel on our hardware, what are the best feasible 
parallel reversal schedules? How many physical steps can be reversed in optimal time 
within what amount of memory? 

So we have a discrete optimization problem with three parameters: 

l - number of physical steps to be reversed 
p - number of processes that can run in parallel 

k - number of states that fit into memory, to be used by processes and checkpoints 

So at any point in time within a schedule, the number of actions ( C , F, P , R ) must be < k. 
Moreover, the number of F. P and R actions must be < p. 

Since it doesn’t make sense to start processes for whose execution we have no memory, 
we demand that: 


p < k 

Note that k has the same meaning as what it called resources in [Wal99] , while p is an 
additional degree of freedom not present in [Wal99]. Due to the introduction of p. we 
relax the assumption of process or- checkpoint convertibility [Wal99, p. 62] which says that 
processors and checkpoints are interchangeable resources. 


6 It may be debatable whether keeping the initial state so is checkpointing in the classic sense. However, 
in our systematics we consider it to be a C action, as it holds one state in memory for one time unit. 



Figure 2.: Total recalculation schedules for the reversal of £ = 4 physical steps. The right 
schedule incorporates the prepare and reverse actions explicitly, as needed for 
algorithmic differentiation. 
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This was a reasonable assumption in 1999, but today’s latops have 8 GiB RAM shared 
by 4 cores. Assuming we have an evolutionary system with large states Sj, where each 
state consists of 10 million double precision floating point values (~80 MiB), we would 
have k > 100 but only p = 4. That ratio becomes worse when looking today’s expensive 
computers, which have 1 TiB RAM shared by 120 cores, leading to k > 13000 versus 
p = 120. For evolutionary systems with smaller states, the ratio is even worse. 

So we relax the processor-checkpoint convertibility and assume that processors ( F , P, R 
actions) can be converted to checkpoints (C actions) by “doing nothing”, but not vice 
versa. From that point of view, our additional parameter p describes how many of the 
k resources can be used as processors, while the remaining k — p resources can only be 
used as checkpoints. 

At a first glance, this seems to imply that the schedules developed in [Wal99] are 
only applicable if we choose p to be p = k. However, according to [Wal99, p. 79] the 
optimal schedules for k resources require just p = \ processes. So these schedules 
are applicable and optimal for all: 



Nevertheless, for p < \ we need to develop new schedules, as the schedules of [Wal99] 
are not applicable. This will result in new schedules that require fewer processors by 
using more checkpoints and thus more memory. 

1.4. Scope and Structure of this Thesis 

While there has been interest in reversal schedules since at least 1973 [Ben73], the first 
systematic analysis of optimal schedules was published 1999 in the dissertation [Wal99]. 
Research continued for about 5 years, leading to another dissertation [Leh03] and various 
papers [GW00] [LW02] [Wal04], all within the same research group. Alas, this topic has 
never been picked up by other research groups. Today, books which cover reversal 
schedules all refer to the same results from a decade ago. [GW08, pp. 278-297] [NS12, 
pp. 142-147] 

This diploma thesis is an attempt to continue the research on reversal schedules by 
relaxing the processor- checkpoint convertibility, which is a central assumption of [Wal99]. 
This introduces two challenges: 

1. The central binary decomposition theorem [Wal99, p. 66] does not hold anymore. 

2. The discrete optimization problem has three instead of two parameters: p,s,£ 
instead of s,£. 

To cope with those challenges, this thesis proposes and follows a more symbolic approach 
to reversal schedules. Also, to keep things manageable, this thesis focuses on parallel 
reversal schedules in the case t = t = t = 1. 

It is structured as follows. In Chapter 2, a small formalism is developed for describing 
schedules, showing which properties still hold and which properties don’t hold anymore 
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due to the relaxation. In Chapter 3, a comprehensive algebra for describing and ana- 
lyzing the profiles of schedules is developed. Finally, this is used in Chapter 4 to build 
new optimal and suboptimal schedules, which are then compared to the already known 
schedules, but also to the result of an exhaustive search over small schedules. 

1.5. Notations and Conventions 

Throughout this thesis, we use the following notations and apply the following conven- 
tions: 


B 

2 a 

r\A) 

XA 


e-i 


F n 


A* 


Set of Booleans, B = {true, false} 

Power set of the set A 

Preimage of the set A under the function / 

Indicator function xa ■ K- — >• {0, 1} of the subset A C M 
The ?'th unit vector in M n 

The rath Fibonacci number with Fi = 1, Fi = 1 and F n = F n _i+F n _2 
The finite field with ra elements, where n is the power of a prime 
The free monoid over the set A 


A The empty word A € A* 

Z(A) The free, associative, unitary Z-algebra over the set A 

Axis order We follow the usual convention in mathematics that the time axis is 
the x-axis and points right, while the y-axis points upwards. So we 
deviate from [Wal99, p. 5], where the y-axis points downwards. We 
also deviate from [Leh03, p. 24] , where the time axis is the y-axis and 
points downwards. 
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2. Algebra on Schedules 


In this chapter we explain parallel reversal schedules and their operations in a formal 
way, assuming the relaxed processor-checkpoint convertibility. We will then check which 
central statements of [Wal99] still hold and which don’t. 

There are many ways to represent parallel reversal schedules formally, each with its own 
strengths and weaknesses in different areas. The algebraic representation presented here 
has been developed with the following goals: The definition of feasible parallel reversal 
schedules, as well as their composition, should be straight forward. Moreover, checkpoint 
persistence and processor persistence should be explainable within the formalism as 
clearly as within the visual representation. 

The weakness of the algebra is that it is harder to see which exact actions are happening 
at a certain point in time. For reference, Appendix B shows how this can be done within 
the schedule algebra, but there is a more elegant way: We will simply introduce a 
separate algebra on schedule profiles, which will be the topic of Chapter 3, combining 
the best of both worlds. 


2.1. Generic Parallel Schedules 

In this section we will develop a generic algebraic representation of parallel schedules 
that works with an arbitrary set A of actions. However, to keep the examples closer to 
the problem at hand, we will stay within our set of actions as introduced in Section 1.2. 
We will explain how this algebra represents our schedules, and visualize the various 
operations. Finally, we will point out the corner cases in which the algebra differs from 
the visual language. 

However, to keep the examples closer to the problem at hand, we will stay within our 
set of actions as introduced in Section 1.2. 

Definition 1 (Actions). The set of actions A is: 

A = {C,F,P,R} (1) 

where C, F, P and R are to be interpreted as the actions introduced in Section 1.2. 

Before getting to schedues, we define an intermediate structure, a task, by which we 
mean a sequential chain of actions that are to be executed one after another. 1 


1 We could also call this a sequential schedule as opposed to a parallel schedule, but we choose the term 
task to prevent confusion. Throughout this thesis, schedule always means parallel schedule. 
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Definition 2 (Task). A task w is a finite sequence, or word, of actions. That is, the set 
of tasks is the free monoid over A: 


w € A* 


The empty task is denoted as the empty word A £ A*. 

Definition 3 (Task duration). The duration |rc| of a task re € A* is the number of its 
actions: 

f 0 for w = A . . 

M = (2) 

In for w = a\ ■ ■ ■ a n , at £ A 

This is also known as the length of a word , the length of a monomial [Coh85, p. 60] or 
the degree of a monomial. 

Remark. Here we assume that every action needs exactly one time unit to run. We can 
so without loss of generality, because if the actions had different rational durations, we 
could redefine our time unit according to the lowest common denominator and split each 
action into repeated smaller actions. For example, if the P action took twice as long as 
the F action, we could introduce a new action P' and redefine P to be the task P'P'. 

Definition 4 (Number of action occurrences). For every action a £ A and task w £ A* 
we define c a (w ) to be the number of occurrences of a in w: 


( 0 for w = A 

C a (w) = \ 

\|{*G {l,...,n} | bi = a}\ for w = b\ - ■ ■ b n , bi £ A 
Lemma 5 (Task duration inequality). For all a £ A, w £ A* we have: 

M > c a {w ) 

Proof. |u>| counts all actions of w while c a (w) counts only some of them. 


( 3 ) 

( 4 ) 
□ 


Example 6. The task w = FFPR, where cf(w ) = 2, means two forward actions 
followed by a prepare and a reverse action. It describes the direct way to reverse the 
third physical step: 



Example 7. The task w = FCCPR, where cf(w) = 1, describes the reversal of the 
second physical step, with a delay in between: 

/ \ 
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Example 8. The task CCCCPR, where cf(w ) = 0, describes the reversal of the first 
physical step, with a long delay at the beginning: 

• — • — • — • — • • 

Motivation. Now we want to define a “+” operation to overlay multiple tasks to a 
schedule. Here it must be possible to unify the front overlaps, which are the common 
prefixes of the tasks. To achieve this, we choose a simple noncommutative algebra over 
A* that provides distributivity. 

Definition 9 (Schedule). A schedule f over A is an element of the free, associative, 
unitary Z-algebra [Coh85, p. 59] over A: 

f G Z{A) 

Definition 10 (Schedule duration). The duration \ f\ of a schedule / £ Z(A) is the 
duration of its longest task: 


0 for / = 0 

max{|u;i| , . . . , |ru n |} for / = d\W\ H b d n w n , Wi £ A* ,di £ Z \ {0} 


In the context of algebra, this is also known as the degree. [Coh85, p. 60] In the context 
of reversal schedules, this is also known as tM ■ [Wal99, p. 58] 

Definition 11 (Number of tasks). For every schedule / € Z(A) we define £(f) to be 
the number of tasks: 




0 for / = 0 

d\ H 1 - d n for / = d\w\ H h d n w n , Wi £ A*,di £ Z \ {0} 


( 5 ) 


Remark. Z(A) is the noncommutative analogue to the ring Z [A\ of polynomials. Al- 
though the only coefficients of interest are 1 and 0, we choose the base ring to be Z, 
because Z does not impose an additional structure on the algebra. 2 A term like 2-FFF is 
then to be interpreted as executing the task FF F twice in parallel, wasting an additional 
processor for no use. A negative term like —1 • FFFPR may occur when considering the 
difference of two schedules f — g, meaning that / is identical to g except that g contains 
an additional task FFFPR. 

Remark. We use the distributivity in Z (A) to extract common prefixes of tasks within a 
schedule. The reason, of course, is that starting from the same state, equal tasks (or task 
prefixes) will produce equal results at the same points in time, so we will calculate them 
only once. After factoring out prefixes to the left as much as possible, the term describes 
our schedule. More precisely, the abstract syntax tree of a maximally left-factored term 

2 For example, if we had chosen F 2 to be our base ring, we would have exactly the coefficients 1 and 0, 
but adding a task to a schedule which already contains that task would make it disappear: For 
S = FPR + CC PR we would have S + CC PR = F PR, which is not what we want. 
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has the same tree structure as the schedules, when ignoring the connections between 
successive reversals and collapsing all intermediate expressions a ■ □ and a ■ (□ + □) 
with a £ A into single nodes. 

Example 12. Adding the tasks of Examples 6 to 8 results in the following schedule: 

/ = FFPR + FCCPR + CCCCPR 

It describes the reversal of three physical steps via the total recall strategy: 



When ignoring the connections between successive reversals, this graph forms a tree, the 
schedule tree. Now we factor out all prefixes to the left, which in this case is just the F 
prefix of the first two tasks: 

/ = F(FPR + CCPR) + CCCCPR 



Figure 3.: AST, collapsed AST and schedule tree of F(FPR + CCPR ) + CCCCPR. 
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The collapsed abstract syntax tree has the same structure as the schedule tree, which 
is demonstrated in Figure 3. 

Remark. Factoring out to the right will not result in the subtasks to be unified. It is 
significant to which side we are factoring out. 

Example 13. In the same schedule as before we factor out all PR suffixes to the right: 

/ = FFPR + FCCPR + CCCCPR 
= (FF + FCC + CCCC ) • PR 

These PR subtasks all start on different states, and even more so, at different times. So 
it makes no sense to unify them, we must repeat them: 





Remark. More generally, in our algebra we deliberately unify only common prefixes and 
nothing else. This is an important difference between our algebra and the visual schedule 
language. We don’t even unify tasks that produce the same state at the same point in 
time, if they have different prefixes. 

Although this way the algebra introduces a “punishment” on some schedules, this 
happens only to schedules that are inefficient anyway, so it makes no sense to introduce 
that additional complexity into our algebra as this will not provide any benefit in return. 

More precisely, let u,v € A* be two tasks without a common prefix that produce 
the same state after the same amount of time. Let w € A* be an arbitrary common 
continuation of both, and g, h € Z(A) be their diverging ends. Then the schedule 
/ = uwg + vwh consists of two entirely separate calculations. The modified schedule 
f = uwg + uwh = uw(g + h) is always more efficient, even if we would unify the two 
common sub tasks w in /. 

Example 14. The overlap in the following schedule: 


/ = FCCFC + CFCFF 


is not unified: 



f f = FC + CFCFjF + C) 


where f is the schedule at the right-hand side of the visual equation above. 
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2.2. Parallel Reversal Schedules 


So far, our algebra is very generic and could have been defined for any set of actions A. 
Now we introduce the specifics of reversal schedules. 

Definition 15 (Parallel reversal schedule). A schedule / G Z(A) is a parallel reversal 
schedule (PRS) if there exist £, d G Z>i and uq, . . . , uq G {C, F}* such that: 

/ = w\ PR + • • • + W(PR (6) 

where: 

c F (wi) = i- 1 i = (7) 

\wi\ = d — 1 — i i — 1 ..... t (8) 

Remark. This definition implies £(f) = £ and |/| = = |wi| + 2 = d. 

Remark. The equations (6) to (8) are to be interpreted as follows: 

(6) All tasks consist of a sequence of C and F actions, followed by a PR. This rules 
out nonsense tasks such as CPCRCF: 



(7) The tasks w\ PR, . . . ,W(PR each reverse the first, second, . . . , t'th physical step. 

This rules out schedules whose reversals are not vertically connected, such as 
CCCPR + FFPR: . 

(8) All reversals line up diagonally towards the end. This rules out schedules whose 
reversals are not horizontally connected, such as CCCPR + FPR: 

Definition 16 (Reach). The reach of a PRS / is the number of reversed physical steps, 
which by (7) is also the the number of tasks £(f). 

Example 17. The smallest PRS is: 




f = PR 

✓ \ 

• • 

m = i 
1 / 1=2 
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Example 18. The second smallest parallel reversal schedule is: 


/ = CCPR + FPR 



Kf) = 2 

1/1 = 4 

Example 19. A more complex parallel reversal schedule is: 

/ = CCCCCCCCCCPR 
+CCCCFCCCCPR 
+CCCCFCCFPR 
+FCFCCFCPR 
+FFFFCCPR 
+FFFFFPR 



Motivation. Definition 15 does not demand a minimal computation time. It includes 
schedules such as / = CCCPR + FCPR with £(f) = 2 and |/| = 5 > 4: 



These will be ruled out by the feasible parallel reversal schedules. 

Lemma 20 (Minimal reversal duration). The minimal duration of a parallel reversal 
schedule f is twice its reach. That is, 


I/I > 2 £(f) 


and this inequality is sharp. 
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Proof. Let d,w\, . . . ,wg be as in Definition 15: 

l 

f = ^2 W \PR + • • • + wgPR 

i = 1 

The inequality follows from Lemma 5 for wg and applying (7) and (8): 


|/| = d = \wg\ + 1 + 1 > CF(m ) + 1 + l = ‘It = 2 £(f) 

Equality holds whenever \wg\ = cp(wg), that is, whenever wg = F e . Among others, this 
is the case for all schedules that implement the total recall strategy: 


f = Y. F 


i-1 (j2i-2i p R 


i = 1 


So this inequality is sharp. □ 

Definition 21 (Feasible parallel reversal schedule). A PRS / is feasible (FPRS) if it 
has minimal duration, that is: 


I/I = 2 £(f) 

In other words, there exist i € Z>i and w \, . . . , wg € {C, F}* such that: 

l 

f = ^2 wiPR + • • • + wgPR 

i= 1 

CF{Wi) = i- 1 

| Wi | = 21 — 1 — i 


Remark. Definition 21 corresponds to the feasible parallel reversal schedules as intro- 
duced in [Wal99, p.58] in the case t = t = 1. 

Definition 22 (Resources profile). The resources profile of a schedule / is a step function 
(piecewise constant function) with support [0, |/|): 

resp / : M — >• M 

where (resp f)(t) is the number of actions at any point in time t £ M. 

Definition 23 (Processes profile). The processes profile of a schedule / is a step function 
(piecewise constant function) with support [0, |/|): 

procp / : M — >• M 

where (procp f)(t) is the number of F, P and R actions at any point in time t € M. 


21 



Definition 24 (Usage). The processes and resources usage of a schedule / is the re- 
spective resource peak over time: 


res / = max(resp /) 
proc / = max(procp /) 

Remark. Since we don’t have negative times and our actions cover discrete time dura- 
tions, processes and resources, we could have defined resp / and procp / to be nonnega- 
tive integer functions Z>o — >• Z>o- However, prefer them to be functions K — >• M as this 
fits better into the algebra on profiles we will establish in Chapter 3. 

Definition 25 (Optimization problem, reach maximal). Given r,p€ Z>o, find a FPRS / 
with 


res / < r 
proc / < p 


for which 


£(f) is maximal 

Definition 26 (Optimization problem, resource minimal). Given £,p £ Z>o, find a 
FPRS / with 

*(/) < l 

proc / < p 

for which 


res / is minimal 


2.3. Decomposition and Composition 

Under the assumption of processor-checkpoint convertibility, two important properties 
were proved by previous research, checkpoint persistence [Wal99, p. 64] and processor 
persistence [Wal99, p. 65]. Those led to the theorem of binary decomposition [Wal99, 
p. 66], which was the base of all further analysis. 

With relaxed processor-checkpoint convertibility, this line of reasoning is no longer 
applicable. Although we can show that checkpoint persistence still holds, there is no 
evidence that processor persistence holds. Finally, we will describe its counterpart, the 
composition [Wal99, p. 74], which is still applicable. 
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Definition 27 (Checkpoint-persistent schedule). A schedule / is checkpoint-persistent 
if its maximally left-factored representation contains only the following subtasks, where 
n £ Z>i ,w € A*: 


C n 

C n PR 

Fw 

PR 

In other words: Once a subtask starts with a C action, it will continue with C actions 
until the next fork. 

Example 28. The following schedule / is not checkpoint-persistent. The violating 
subtask is underlined and marked in the visual schedule. 

/ = F(F 3 (FPR + C 2 PR) + C 3 FC (FPR + C 2 PR )) + C 8 (FPR + C 2 PR) 



Definition 29 (Processor-persistent schedule). A schedule / is processor-persistent 
if its maximally left-factored representation contains only the following tasks, where 
n € Z>i ,w € A*: 


F n PR 

Civ 

PR 

In other words: Once a subtask starts with an F action, it will continue with F actions 
until the next fork. 
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Example 30. The following schedule / is not checkpoint-persistent. The violating 
subtask is underlined and marked in the visual schedule. 

/ = F A (FPR + C 2 PR) + C 3 ( F 2 CFPR + C 3 (F(FPR + C 2 PR) + C a PR )) 





Theorem 31 (Checkpoint persistence). Let f be a FPRS of reach i that uses p processes 
and k resources. Then there exists a checkpoint-persistent FPRS f of the same reach t. 
f uses no more than p processes and k resources. 

Corollary 32. In the search for optimal FPRS it is sufficient to restrict the search to 
checkpoint-persistent schedules. 

Proof. This proof implements the same idea as in [Wal99, p. 64] , but is expressed through 
the notions of our schedule algebra. 

If / is checkpoint-persistent, we set f = /. Otherwise, / ist not checkpoint-persistent. 
That is, within its maximally left-factored representation / contains a violating subtask: 

C n F with n € Z> i 

Let w £ AT be the preceding task, let hi € Z(A) be the following subschedule, let 
Fh 2 € Z(A) be the possibly existing other fork after w, and g € Z(A) the remaining 
schedule, that is, the sum of all tasks in / that to not have w as prefix. (Note that it is 
possible that w = A, h 2 = 0 and g = 0.) Then: 

/ = w{C n Fhi + Fh 2 ) + g 

Now we apply the following transformation of /, which fixes this particular violation: 

w{C n Fhi + Fh 2 ) +g wF^hx + h 2 ) + g 

At time |re|, the new schedule starts one fewer C action. At time |re| + n, it starts one 
fewer F action and possibly one more C action. (It won’t if h 2 starts with C n .) At all 
other points in time, both schedules have identical usage. So the new schedule uses no 
more than p processes and k resources. 

This transformation can be iterated until all violations of checkpoint-persistence are 
rectified. Within each iteration a C action moved to the right, but there is never any C 
action moved to the left. So this cannot continue infinitely. 
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Let f be the final result after which no further iteration is possible, f must be 
checkpoint-persistent, otherwise one more iteration would be possible. Moreover, f 
uses no more than p processes and k resources. 

It remains to show that f is a FPRS. Representing / as of Definition 21, the transfor- 
mation preserves |ruj| and cp(w t ) for all tasks w t , hence it also preserves the duration |/|. 
Since it doesn’t add or remove any task, it preserves the reach l. So the transformation 
preserves FPRS. Since f results from iterated transformation, it is also a FPRS. □ 

Example 33. Figure 4 shows a schedule / that is not checkpoint-persistent and the 
corresponding checkpoint-persistent schedule f after one iteration of the transformation 
described in the proof. 


/ = F(F 3 (FPR + C 2 PR) + C^F_(FPR + C 2 PR )) + C 8 (FPR + C 2 PR) 



f = FF(F 2 (FPR + C 2 PR) + C^{FPR + C 2 PR)) + C 8 (FPR + C 2 PR) 



Figure 4.: Establishing checkpoint persistence. 


Remark. The analogue transformation for processor persistence is: 

w^Chx + Ch 2 ) +g ^ wC(F n t n + h 2 ) + g 

At time | iu \ . the new schedule starts one fewer F action. At time |(o| + n, it starts one 
fewer C action and possibly one more F action. (It won’t if h 2 starts with F n .) If the 
original schedule already used p processes at time |u>| +n, and h 2 does not start with F n , 
the new schedule will use p+ 1 processes. 
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This means that we cannot apply this transformation in general, which is why the 
proof of [Wal99, p. 65] is not applicable here. 

Example 34. Figure 5 demonstrates the worst case. It shows a schedule / that is not 
processor-persistent and the corresponding processor-persistent schedule f after four 
iterations of the transformation, f uses more processes than /. 


/ = F 4 PR + C(F 2 C(FPR + C 2 PR) + C 2 (FC a PR + C'PR)) 



f = F 4 PR + CC(F^(FPR + C Z PR) + CCC{FPR + C' Z PR )) 



Figure 5.: Establishing processor persistence after four iterations. Schedule / uses p = 2 
processes, while f uses p = 3 processes. 

Definition 35 (Binary-decomposable schedule). A schedule / is a binary- decomposable 
schedule if it is checkpoint-persistent and processor-persistent. 

That is, the maximally left-factored representation of / contains only the following 
tasks, where n € Z>i: 


F n 

F n PR 

C n 

C n PR 

PR 

Example 36. The schedules f in Figures 4 and 5 are both binary- decomposable. 

Remark. A binary-decomposable schedule is fully determined by its forks, which is a 
convenient property. However, due to the lack of processor-persistence, we cannot assume 
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that optimal FPRS are binary decomposable in general. Nevertheless, we can apply the 
inverse operation binary composition and will use that extensively to generate optimal 
and suboptimal schedules in Chapter 4. 

Definition 37 (Primitive schedule). The elementary schedule £ is: 

e = PR 

Definition 38 (Schedule composition). The composition ( f,g ) of two FPRS / and g is: 

(f,9) = F^f + CWg (9) 

Remark. The composition (/, g) is a FPRS with reach £((f,g)) = £(f) + £{g) and dura- 
tion \(f,g)\ = |/| + \g\. 

Example 39. The composition of the following schedules: 

/ = FPR + C 2 PR £(f) = 2 

g = F 2 PR + CFCPR + C 4 PR £(g) = 3 


yields: 


(/, g) = F 3 (FPR + C 2 PR) + C 4 (F 2 PR + CFCPR + C 4 PR ) 
= F 4 PR + F 3 C 2 PR + C 4 F 2 PR + C 5 FCPR + C 8 PR 

m,g)) = 10 



Proposition 40. If f and g are binary-decomposable, their composition ( f,g ) is binary- 
decomposable, too. 

Proof. Let / and g be maximally factored to the left, then their subtasks are only the 
ones listed in Definition 35. The composition (/, g) add two more subtasks to this, which 
are both allowed by Definition 35. □ 

Example 41. All binary-decomposable schedules up to reach £ = 4 are depicted in 
Figure 6. 
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PR 



(e,e) = FPR + C 2 PR 

((£,£),£) = F 2 PR + FC 2 PR + C 4 PR 


(e, (e, e)) = F 2 PR + C 2 FPR + C 4 PR 


(((e, e),e),e) = F 3 £ + F 2 C 2 £ + FC 4 £ + C 6 £ 


((£,(£,£)),£) = F 3 £ + FC 2 F£ + FC 4 £ + C 6 £ 


((£,£),(£,£)) = F 3 £ + F 2 C 2 £ + C 4 F£ + C e £ 


(£,((£,£),£)) = F 3 £ + C 2 F 2 £ + C 2 FC 2 £ + C 6 £ 


(£, (£, (£, £))) = F 3 £ + C 2 F 2 £ + C 4 Fe + C 6 £ 


Figure 6.: Small schedule compositions up up to reach £ = 4. 
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3. Algebra on Profiles 


In this chapter we concentrate on the profile of a schedule, that is, the function that 
describes how many resources or processes are used at each point in time. Profiles 
are essentially step functions (piecewise constant functions), but we describe them in a 
slightly different way that will fit our purpose better. 

Our main interest is to find a convenient description of how the profiles combine when 
two schedules are combined. Also, we will be able to describe exactly the difference 
between the two possibilities to combine three schedules. The main goal is, of course, 
to decribe upper limits of profiles. That is, we want to verify that our schedules do not 
exceed the number of processes and resources available. 

While no formal language is ever perfect, the formalism developed here is at least 
powerful enough such that most lemmas can be proven solely by mechanically applying 
transformation rules (equations) of previous lemmas. Although much effort went into the 
proofs to make them concise, most could have been developed by simply expanding all 
terms as much as possible, then comparing the results. To provide a complete formalism, 
all possible operator relationships are studied, even if they do not contribute directly to 
a larger theorem. Some interesting connections between the algebra developed here and 
polynomials are described in Appendix C. 


3.1. Profile Space 

Definition 42 (Primitive functions). For every t € R>o, the primitive function (ft is: 

( pt : R — >• R 

V4 X[t,oo) 

where xa denotes the indicator function as listed in Section 1.5. Further, we define B 
to be the set of primitive functions: 

B = {(ft | t G M> 0 } 

Definition 43 (Profile space). Let (R r , +, <) be the partially ordered R-linear space 1 

of all functions R — >• R, where +, • and < operate pointwise, as usual. 


X A partially ordered linear space is a preordered linear space whose preorder < is also a partial order. 
A preordered linear space is an P-linear space V with a compatible preorder <. That is, for all 
x,y, z € V, A £ F: x < y ==> x + z < y + z and x > 0, A > 0 => X ■ x > 0. For every set A, the 
R-linear space R A of functions A — > R, equipped with the pointwise defined partial order <, forms a 
partially ordered linear space. [Bou87, p. 11.12] 
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Within that space, we define the R-linear subspace of profiles P C R K to be the linear 
span of all primitive functions: 


P = spa n R (H) 

That is, for each profile f G P there exist ti, ... ,t n G R>o and ai, . a n G R such that: 


/: R->K 

n 

f = Y1 am i 

i = 1 

Remark. For the problem at hand, it would have been sufficient to restrict profiles to 
integer arguments and values. 2 However, for the algebra introduced here that wouldn’t 
simplify much. It would also complicate the usage of the arrangement operation as 
of Definition 59, where we divide by 2. So we will build this slightly more general 
framework, which is also capable of handling schedules with non-integer times as well as 
non-integer resource usage. 

Remark. Note that although we don’t consider negative time values, the profile functions 
are formally defined for those, too. Since we don’t include any primitive functions 
ipt with t < 0, all profile functions are zero for negative times. This may seem strange, but 
prevents many edge cases in the following definitions and proofs, especially Definition 49 
and Lemma 50. 

Lemma 44 (Profile values). For every f G P, / = YH=i a iPu with t\ < ■ ■ ■ < t n , the 
function values are: 3 4 

k 

t G [tkfik+i) => f(t) = Y2 ai 

i= 1 


k G {0, . . 

■ , n} ' 

with to 

= — oo 

and - 

= 

oo. 

That is: 


0 



t. 

G 

(- 

oo ,tl) 


Of 



t. 

G 

[h. 

,t 2 ) 

f(t) = < 

U\ + 

CD 


t. 

G 

[t 2 . 

T 3 ) 


a\ + 

a 2 + • • 

' • + a n- 

-l t 

G 

[tn 

— 1 j tn) 


at + 

V 

o 2 + • • 

’ • + a n - 

-l + a n t 

G 

[tn 

,00) 


2 We would then have ipt'. Z -> Z with t G Z>o. Also, we would be dealing with free Z-modules instead 
of linear spaces over R. 

3 We follow the convention that the empty sum evaluates to zero: JA=i = 0. 

4 Note that the first condition t £ (— oo,ti) is equivalent to t G [— oo,ti), because t G R. 
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Proof. First of all, note that to and f n _|_i are conveniently defined such that: 


t 0 <ti<-- - <t n < tji+i 
Now, let k G {0, ... , n} and t G [t k 

n 

f(t ) = 22 ai 
2=1 
n 

= J2 ai 

2=1 

k 

= Y J <n 

i = 1 
k 

=i2 ai 

2=1 

k 

=i2 ai 

2=1 

Remark. Lemma 44 shows that profiles are step functions whose step intervals are closed 
on the left and open on the right. 

However, our representation deviates from the classic representation in that our base 
intervals are not disjoint, but all overlapping towards +oo. This causes an implicit 
summation of the coefficients, which is intentional and will simplify a lot. 

For example, compare the following representations of the resources profile of sched- 
ule ((e, e),e) as shown in Figure 7. The coefficients in the classic disjoint representation 
are the partial sums of the coefficients in our representation: 

resp ((£, e),e) = 2cp 0 + <£>2 - v? 4 - <P 5 ~ Pa (ours) 

= 2x[o,2) + 3x[2,4) + 2x[4,5) + X[5,6) (classic) 

Our representation enables us to think of the term u aipt” as an event that happens at 
time t and from then on will change resource usage by a. So ip 2 could describe the 
start of some computation at time 2 that consumes one resource unit. Then, —(ps could 
describe the end of that computation at time 5, freeing the one resource unit. Their 
sum, ip 2 ~ P >5 would then describe a schedule in which exactly these two events happen, 
and nothing else. This is visualized in Figure 7. 


,tk+ 1 ). It follows that: 


•Vuit) 


X[ti,00 ) (^) 

n 

X[ti,00 )(*)+ ^2 ai ■ x it if oo )(*) 

V ' i=k+1 ' V ' 

t>ti t<ti 

n 

i + 'y ^ ai ■ o 

i=k + 1 


□ 


Corollary 45 (Profile image). For every f G P, the image of f is: 


{f(t) | t G M} 



k = 0, . . . 


,n 
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¥>2 = 


1 

0 

-1 J 

6 123456789 10 

1 i 

~<P5= 0- 

-1 J 

0 123456789 10 

1 -i 

P2~ Tb = 0 - 

-1 J 

0 123456789 10 

3 1 

2 - 

1 - 

2<A) + <P2 ~ PA - Tb - Tb = 0 ^ 

0123456789 10 

Figure 7.: Simple profile functions. The last diagram decribes the resources profile of 
schedule ( s , (e,£)) as depicted in Figure 6. 





Lemma 46. The primitive functions as of Definition f2 

B = {^t 1 1 e K>o} 

form a basis of the W-linear space P. 

Proof. According to Definition 43, B spans P. It remains to show that B is linearly 
independent. 

Assume there is a nontrivial linear dependence between elements of B. That is, there 
exist t\ < ■ ■ ■ < t n and a \, . . . , a n / 0 such that: 

n 

0 = a Wh 

i = 1 

Let / € P be the right-hand side. Evaluating / at t\ and applying Lemma 44 leads to: 

i 

0 = f(t i) = ^ eg = Qi 

i=l 

which contradicts the assumtion. Hence, B is linearly independent and thus a basis. □ 
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Motivation. In Definition 43 we equipped P with a partial order <. This serves two 
important practical needs: 

1. To verify that a schedule doesn’t release more resources than it uses, we check that 
its profile / € P satisfies / > 0. 

2. To verify that a schedule’s resource usage doesn’t exceed r € M>o, we check that 
its profile / € P satisfies f < r ■ tpo. 

Since it is so useful, we will have a closer look at how to actually check the < relation 
within our algebra. Note that it is sufficient to know how to check / > 0, because every 
other relation / > g can be checked by calculating f — g and checking if (f — g) > 0. 

Lemma 47. For every f € P, / = Yl \ Li a iVu with t\ < ■ ■ ■ < t n , nonnegativity can he 
verified as follows: 

k 

f> 0 > 0 \/k = l,...,n 

i = 1 

Proof. This is a direct consequence of Corollary 45: 

{ k 

^2 a i 

i= 1 

{ k 

i=i 

That is, the sums describe exactly all function values /(t), so it is necessary 

and sufficient to check that those are nonnegative. We omit the check of the zero 
value (for k = 0), as we already know it is nonnegative. □ 

Lemma 48. The < relation satisfies the following properties for all t,u € M>o-' 

<Pt > 0 (10) 

ipt > (p u for t < u (11) 

Proof. These properties are quite obvious from Definition 42 by looking at the underlying 
indicator functions for (pt and <p> u . However, this lemma can also be proved purely within 
the algebraic framework, by applying Lemma 47. 

Consider the following profiles /, g G P: 

/ = 1 • <Pt 

g=l-(p t + (-1) • ip u 




Applying Lemma 47 to /, we know that / > 0 because 1 > 0. This proves (10). 
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We then observe that for t = u, (11) is obviously true. So we assume t < u. This 
enables us to apply Lemma 47 to g. We conclude that g > 0 because 1 > 0 and 
1 + (—1) > 0. So <pt — ip u > 0, which proves (11). □ 


3.2. Shift, Duration and Final Value 

Definition 49 (Shift operation). For every / € and a G M>o we define / 2> a 
(“/ shifted by a”) to be the following function: 

/ 2> a : M — >• M 
/ S> a : t (->• f(t — a) 


The shift operator has a stronger binding than addition, such that / >a + 5 >i) means 

(/ > a) + (3 > &)• 

Lemma 50. For all f,g€ M R , a, b G M>o and »eR, the following identities hold: 


0 > a = 0 

(12) 

/ » 0 = / 

(13) 

if » a) » b = f » (a + b) 

(14) 

ip a ^>b = (p a+b 

(15) 

(/ + 3)»a = /»a + 3»a 

(16) 

(u • /) » a = v ■ if » a) 

(17) 

/ > a < g > a for / < 3 

(18) 


Proof. For all t G M, we have: 


[0 S> a] (■ t ) 
[/»0](t) 
[(/ > a) > b](i) 
[<7>a > b](i) 

[(/ + 3 ) > «](*) 
[(u • /) > a](i) 


[0] (t — a) = 0 
/(* - °) = /(*) 

[/ > a](f - 6) = /((i - b) - a) = f(t- (a + b)) = [f > (a + 6)](i) 
(i - b) = X[a,oo)(t - b) = X[a+b,oo)(t) = Pa+b(f) 

[f + g\(t -a) = f(t - a)+g{t - a) = [f > a](i) + [3 > a](i) 

[u • /](t - a) = v ■ f{t - a) = v ■ [f » a](i) = [v • (/ » a)](i) 


Also, we have: 


/ < 3 


/(*) < 3(0 

Vi G M 

's' 

1 

'os 

VI 

TT 

1 

-to 

Vi G K 

[/ > a](t) < [3 » a](i) 

Vi G M 

if > a) < (3 > a) 



□ 
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Lemma 51 (Shift closure). P is closed under 3>. That is, for each f G P and a € M>o, 

(/ » o) € P 


Proof. For each a € M>o, consider the following map: 

m a : M R -> M r 
m a : f e-b / » a 

First, rn a maps the base B of P into itself by (15). Also, m a is M-linear according to (16) 
and (17). Hence, P is closed under m a . □ 

Definition 52 (Duration and final value). For every / 6 P we define the duration of f, 
denoted as |/|, to be: 

i/i = (° / = 0 

[tn f = appti H 1- a n Tt n with t\ < ■■■ <t n and a n / 0 

Also, we say that /(|/|) is the final value of /. 

Remark. In other words, |/| is the time of the last event that affects the profile. If / is 
the profile of a schedule where all used resources are released at the end, the final value 

is /(I/I) = 0. 

Lemma 53. For all /, g € P, v € M, a € M>o, the following statements hold: 


Wa\ = a 


(19) 

\f + g\ < nrax(|/| , \g\) 


(20) 

II 

for v / 0 

(21) 

1/ > a\ < |/| + a 


(22) 

!/>«! = I/I +a 

for / / 0 

(23) 


Proof. First we get the trivial cases out of the way: 

• Representing <p a = 1 • ip a , we see that (19) follows directly from Definition 52. 

• If / = 0 or g = 0, then (20) is obviously true. 

• If / = 0, then (21) and (22) are obviously true. 

It remains to show that (20) to (23) hold for / ^ 0 and g ^ 0, which allows us to 
represent / and g as follows (largest terms underlined): 

/ = oi • cp tl 4 h <fn ~ Tt n with ti < ■■■ <t n and a n / 0 

9 = hi ■ <p Ul H b b m ■ Vum with ui < ■■■ <u m and b m / 0 

So |/| = t, n and |g| = u m . Since v 0 and a n / 0, we have: 

C(l a 0 
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This enables us to prove (21): 

\v ■ f \ = va\ ■ ip tl H 1- van ■ Vt n = t n = \ f\ 

Since ti < ■ ■ ■ < t n , we have: 

t\ + a < ■ ■ ■ < t n + a 
This enables us to prove (22) and (23): 

1/ > a\ = a\ ■ (pti+a H f a n ■ Vt n +a \ = t n + a = \ f\ + a 

For (20), we distinguish the following cases: 5 

t n > u m y | / T (J | 

tn<U m => \f + g\ 

t n = u m and a n + b m ^0 =^y \f + g\ 

tn — u m , and CL n T b m — 0 V \f + g\ 

In all cases it follows that: 

1/ + g\ < nrax(t n , u m ) = nrax(|/| , |p|) □ 

3.3. Complete Profiles 

Motivation. When working with our algebra, (20) of Lemma 53 is somewhat unsatisfac- 
tory: 

\f + g\ < max(|/| , | 5 |) 

First, this is merely an inequality and doesn’t provide a way to calculate the exact 
duration of two added profiles. Second, when two schedules run in parallel, we expect 
that their total duration doesn’t shrink. So it runs against our intuition that there exist 
profiles / and g such that 

|/ + 5l < max(|/| , \g\) 


5 Note that in the last case, < cannot be replaced with <. When the last terms cancel out, we may run 
into the special case of the zero profile. This effect leads to equality whenever t n — 0. For example, 
consider / = 2<po and g = — 2y>o- Here, t n = 0, u m = 0, a n = 2, b m = — 2 and / + g = 0. So we are 
in the case t„ = u m and a n + a m = 0, yet we don’t have \f + g\ < t n , but \f + g\ = t n - 


— | • • • + a n ■ ip tn 

— tn 

= • • • + b m ■ (film 

= u m 

= • • • + (a n + b m ) 

■ ( Ptn = tn 

= • • • + 0 • <p tn 

ss 

HO 

VI 
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For these reasons we introduce P*, the set of complete profiles, which is a convenient 
subset of P where the desired equality always holds: 

\f + g\= max(|/| , \g\) 

However, this comes at a price, as P* is not closed under all operations. So when 
calculating with complete profiles, we must be aware that some intermediate results 
may slip into P \ P . To get the best of both worlds, we try to stay within P for as long 
as possible, but leave it when necessary. 

Definition 54 (Complete profiles). We say that a profile / € P is complete if it is 
nonnegative and its final value is zero: 

/> 0 and /(|/|) = 0 

We define P* to be the set of complete profiles, that is: 

P* = {/ G P | / > 0 and /(|/|) = 0} 

Remark. These two conditions are very natural and satisfied by any profile that corre- 
sponds to a real schedule. 

1. / > 0 means that there are never more resources released than used. 

2. /(|/|) = 0 means that all used resources are released at the end. 

Profiles that fail to meet these conditions are considered incomplete in the sense that 
they seem to describe only a part of a schedule. 

1. If / ^ 0, there is a time range of negative resource usage. That is, there are more 
resources released than actually in use. We say that / is incomplete, because it 
seems the first part of the schedule is missing, containing events that start using 
those resources. 

2. If / > 0, but /(|/|) 0, the profile ends with a positive resource usage, using 

those forever. We say that / is incomplete, because it seems the second part of the 

schedule is missing, containing events that finish using those resources. 

Lemma 55. For all f,g£ P* the duration of their sum can be calculated via: 

\f + g\ =max(|/|,| 5 |) (24) 

Proof. If / = 0 or g = 0, then (24) is obviously true. If / 0 and g 0, we can 

represent / and g as follows (largest terms underlined): 

/ = ai • ip tl 4 h a n • <Pt n with ti <■■■ <t n and a n 0 

9 = b\ ■ <p Ul H b b m • Pu m with ui < ■■■ <u m and b m 0 

So |/| = t n and \g\ = u m . It follows that f(t n ) = /(|/|) = 0. 
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We know that n / 1, because for n = 1 we would have 0 = f{t n ) = f{t\) = a\ = a n , 
which would contradict a n 0. 

So n > 2, hence we can apply Lemma 44 to t n as well as t n -±: 

n n— 1 

f ifn) = ^2 a i f{tn- 1) = ^2 

2 = 1 2 = 1 

Using / > 0, it follows that: 

n n— 1 

(in = <H = /(in) - /(in- 1) = 1) < 0 

2=1 2=1 

Since a n / 0, we conclude a n < 0. With similar reasoning, we conclude 6 m < 0. Hence, 

a n + b m / 0 

Finally, we distinguish the following cases: 


tn 2> U r n — 

=> \f + 9\ = 

• • • + a n • 

— tn 

= max(t„, u m ) 

tn ^ — 

=* \f + g\ = 

• • • + b m ■ (p Um 

= Um 

= max(f n , u m ) 

s 

II 

e 

■HO 

=* l/ + sl = 

' ' ' + (tin + &rn) 

■ = tn 

= rna x(t n ,u m ) 


In all cases it follows that: 

I/ + g\ = max(f n , u m ) = max(|/| , \g\) □ 

Lemma 56. P* is closed under conical combination 6 and shift operation. That is, 


f + ger* 

(25) 

v ■ f £ P* for v > 0 

(26) 

/ > a £ P* 

(27) 


for all f,g £ P* ; v € M and a £ M>o- 


6 A conical combination is a linear combination where all coefficients are nonnegative. [Jet86, p. 51] 
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Proof. According to Definition 54, we must prove the following statements: 


f + 9> 0 
v ■ f > 0 
/ » a > 0 

[f + g\(\f + g\) = o 
[v-f\{\v-f\) = 0 
[/ > a](|/ > a|) = 0 

Since / > 0 and g > 0, it follows that: 

f + g> o 

Since v > 0 and / > 0, it follows that: 

v-f> 0 

According to (12) and (18), / > 0 implies: 

(/ > a) > (0 > ffl) = 0 

Without loss of generality, we assume |/| > |g|. It follows that: 

g(\f\) = g(\g\) = o 
\f + g\ = max(|/| , |cjr|) = |/| 

Hence, 

[/ + <?](!/ + g\) = [/ + s-KI/l) = /(I/I) + 9(1/1) = 0 + 0 = 0 

For v = 0 we have: 

[vMvf\) = [0-Mvf\) = m\vf\) = 0 
For v > 0 we apply (21): 

[v ■ /]( \ v ■ /I) = [v ■ /Kl/I) = v ■ /(|/|) = u • 0 = 0 

For / = 0 we apply (12): 

[/ > «](!/ > «l) = [0 > a\(\f > a|) = [0](|/ > a|) = 0 
For / /Owe apply (23): 

[/ > «](!/ > a\) = [f > «](l/l + «) = /(I/I + a - a) = /(|/|) = 0 □ 


40 



Lemma 57. All finite interval functions in the nonnegative range that are closed to the 
left and open to the right are complete profiles. That is, for all a,b € M>o with a < b: 

X[a,b ) =Va-Vb£ IP* (28) 

Proof. For a = b, we have: 

X[a,b) = X 0 = 0 e P* 

For a < b, we first note that X[a,b) is a profile, because it is a linear combination of 
primitive functions: 

X[a,b) = 1 • <Pa + (“I) ' Tb € P 
Since the image of X[a,b) is {0, 1}, it follows that: 

X[a,b) > 0 

From a < b and Definition 52 it follows that |x[a,6)| = | Ta ~ <Pb\ = b, hence: 

X[a,b) ( | X[a,b) \ ) X[a,b)(b) 0 

So by Definition 54, X[a,b) € P . □ 

Lemma 58. The complete profiles are the conical hull 7 of all finite interval functions 
in the nonnegative range that are closed to the left, and open to the right. That is, 

P* = coni(Z) 

where 

Z = { X[a,b ) \a,b e M> 0 with a < b} 

Proof. From Lemma 57 it follows that: 

X CP* 

According to Lemma 56, P* is closed under conical combination, so we conclude: 

coni(X) C P* 

It remains to show that P* C coni(X). Let / € P* and represent it as follows: 

n 

f = Y2 a iTti with ti <■■■ <t n 
1=1 

7 The conical hull conifS 1 ) is the set of all conical combinations of elements of S. [Jet86, p. 51] 
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Further, let 0, bi , . . . , b„ be the function values according to Lemma 44. That is, 

k 

bk = ^2 ai for k = 1 , . . . , n 
i = 1 

Now, / can be written as: 


m 


o t e (— oo, t±) 
b\ t e [h,t 2 ) 

bn— 1 t G 

ie[i„,oo) 


This translates directly to a linear combination of interval functions: 


/ — O' X(-oo,ti) + &i ' X[ti,t 2 ) + • • • + b n - 1 • X[t„_i,t n ) + b n ■ X[t„, oo) 

Since b n = f(t n ) = /(|i|) = 0, this simplifies to: 

f = bi ■ X[ti,t2) h b n —i ■ X[t„_i,t„) 

Since / > 0, we know that all 6/,- > 0. So this is not just a linear combination, but a 
conical combination. Hence, 


/ € coni(Z) 

This proves P* C coni(Z), so in total we have P* = coni(X). □ 

3.4. Arrangement 

Motivation. After having defined all basic operations on profiles, we want to calculate 
the profile for combined schedules as of Definition 38. However, that operation lacks 
useful properties such as associativity. So we introduce an intermediate operation that 
arranges two profiles exactly as needed, but does not prepend the additional operations 
(checkpoint holding and forward calculation), as illustrated in Figure 8. This arrange- 
ment operation has some nice properties and will serve as building block for the real 
combination of schedule profiles. 

Definition 59 (Arrangement). For two profiles /, g G P we define / arranged with g , 
denoted as f * g, to be: 

f*g = r> y+5» i/i 
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Figure 8.: Arrangement. 


Lemma 60. P* is closed under arrangement. That is, for all f,g€ P* we have: 

f *g€¥* 

Proof. Arrangement is just a composition of two shifts and one addition, and P* is closed 


under shift and addition (Lemma 56). So P* is also closed under arrangement. □ 

Lemma 61. Arrangement has the following properties for alI/,g£P,»eM,o£ P>o ; 

0 * / = / (29) 

/ * 0 = / (30) 

v •(/* g) = (v-f)*(v- g) (31) 

(/ * g) » a = (/ » a) * g for / / 0 (32) 

Moreover, for all f,g,h£ P*: 

(/ + g) * h = f * h + g > -y for |/| > \g\ (33) 

h* (f + g) = h*f + g^> \h\ for |/| > \g\ (34) 

\f * g\ = \f\ + \g\ (35) 

(f *g)*h = f*(g* h) (36) 


Proof. Let /, g 6 P, v € M , a € M>o- We apply (12) and (13) to prove (29) and (30): 

Ifl 

0 * / = 0 » ^ + / » | 0 | = 0 + (/ » 0 ) = / 

/*0 = /»M + 0»|/| = (/»0) + 0 = / 

We apply (29) to prove (31) in the case v = 0: 

0 • (/ * g) = 0 = 0 * 0 = (0 • /) * (0 • g) 
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We apply (17) and (21) to prove (31) in the case v ^ 0: 
v-(f*g) = v- (7 » y + S' > |/|^ 

= V (/> y) + V (ff » I/I) 

= (v/) » y + 0>-s) » I/I 

= («•/)» + (v • 5) > |u • /I 

= (v ■ f) * {v ■ g) 

Assuming / / 0, we apply (16), (14) and (23) to prove (32): 

(/ * g) » a = (7 » y + 3 > 1/1^ > a 

= (/> y) >a + ( 5 > I/I) >a 

= / (y + + 9 ^ (I/I + a ) 

= (/>a) > y + 5> l/ >a l 

= (/ > a) * g 

»c 

From now on, let /, g, h G P . 

For |/| > |g|, we apply (16) and (24) to prove (33) and (34): 

(/ + g) * h = (/ + g) > y + h > |/ + ff| 

= />Y+£' > y + /?,> max ^l ’ M) 

= /»Y +5> ^ + /l>l/l 
= /*/» + $» ^ 

h* {f + g) = h^> - - - + (/ + fl 1 ) |^| 

= /t » m ax(|/|,H) +/ » N+g » N 

= h 3> y + / ^ 1^1 + 5 |/i| 

= h*f + g^> \h\ 
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We apply (30) and (29) to prove (35) and (36) for the case 5 = 0: 


l/*0| = |/| = |/| + |0| 

(f * 0) * h = f * h = f * (0 * h) 

For 5 / 0, we apply (22) and (23) to prove a helper inequality: 

/» yl < 1/1 + y<l9l + l/l = l»» l/ll 

Using that, we apply (24) and (23) to prove (35) for g / 0: 

\f*g\= f > l|i +g > |/| 

= max f f » , \g > |/||) 

= IS » I/ll 

= M + I/I 

Finally, we apply (33), (32), (14) and (35) to prove (36) for g / 0: 

(/ » 9) * h = (/ > y + 9 » l/l) * I* 

= (9»l/l)*k+(/»y)»y 
= (9./ !) »i/i + /»!4±U 
= (9*M> l/l + /> 

= f*(g*h) □ 

Remark. The associativity allows us to omit the parentheses as long as we stay in P : 

/ * g * • • • * // € P* 

Remark. We cannot generalize (33) to (36) from P* to P. For example, consider: 

/ = <y9 0 + <pi 

g = -<po 
h = <p 0 
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This would be a counterexample to (35), because: 


\f*g\ = 0 
\f\ + \9\ = 1 

It would also be a counterexample to (36), because: 

(. f*g)*h = 2 ip 0 

f*(g*h) = (fi 0 + Vi 


3.5. Algebra Summary 

Remark. Up to now, we built a fairly comprehensive algebra. We introduced new no- 
tations and rooted them in the theory of linear algebra and partially ordered sets. We 
proved that almost everything fits together neatly, and drew clear boundaries around 
the issues that didn’t work out as nicely as we may have hoped. After all that, it is time 
to take one step back and look at what we’ve got so far. 

Throughout this summary we will use the following variables: 

a, b € R>o 
f,g,he¥ 

We introduce the following primitives, operations and relations: 


0 € P 
Pa £ P 

f + g g P 
v ■ f G P 
/ < g e B 
/ > a € P 

I/I € M> 0 

P* € 2 P 

f*g e P 


Definition 43 
Definition 42 
Definition 43 
Definition 43 
Definition 43 

Definition 49 and Lemma 51 
Definition 52 
Definition 54 
Definition 59 


These satisfy the following rules within P: 

0 + / = / 
f+g =g+f 
(/ + g) + h = / + {g + h) 
0 -/ = 0 


Defin i tion 43 
Definition 43 
Definition 43 
Definition 43 
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!■/ = / 


Definition 43 

v-(w- f) = (vw) ■ f 


Definition 43 

(v + w) ■ f = (v- f) + (w ■ f) 


Definition 43 

Va > 0 


Lemma 48 

Pa > Pb 

for a < b 

Lemma 48 

f + h < g + h 

for f <g 

Definition 43 

a ■ f > 0 

for / > 0 

Definition 43 

0 > a = 0 


Lemma 50 

/ » o = / 


Lemma 50 

(/ » a) » b = f > (a + 6) 


Lemma 50 

> & = T’a+fe 


Lemma 50 

(/ + 5 f )>a = />a + 5>a 


Lemma 50 

(v ■ /) » a = v • (/ » a) 


Lemma 50 

/ a < g » a 

for / < fir 

Lemma 50 

iT’al = a 


Lemma 53 

/ + S' <max(|/|,|^|) 


Lemma 53 

II 

p 

for n / 0 

Lemma 53 

1/ > a| < |/| +a 


Lemma 53 

|/ > a| = |/| + a 

for / / 0 

Lemma 53 

f*g = f^> y+5» I/I 


Definition 59 

o */ = / 


Lemma 61 

f*0 = f 


Lemma 61 

v ■ {f * g) = {y ■ f) * {y ■ g) 


Lemma 61 

(f*g) » a = (/ » a) *5 

for / / 0 

Lemma 61 
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And they satisfy the following additional rules for f,g,h£ P : 


0 € P* 


Definition 54 

Va-Vb^ P* 

for a < b 

Lemma 57 

/ + 9 G P* 


Lemma 56 

a ■ f € P* 


Lemma 56 

/ » a <E P* 


Lemma 56 

/ * 9 € P* 


Lemma 60 

/> 0 


Definition 54 

/(I/I) = 0 


Definition 54 

I/ + 0 I = max(|/| , |#|) 


Lemma 55 

(f + g)*h = f*h + g^>^ 

for I/I > \g\ 

Lemma 61 

h*(f + g) = h* f + g^>\h\ 

for I/I > \g\ 

Lemma 61 

\f*g\ = I/I + \g\ 


Lemma 61 

(/ * g) * h = f * {g * h) 


Lemma 61 


3.6. Schedule Profiles 

Motivation. Now that we built our algebra, it is time to apply it to reversal schedules. 
We start with the elementary schedule e and continue with the composition of schedules. 
We will simultaneously consider processes profiles as well as resources profiles. 

Definition 62 (Elementary profile). We define the elementary profile e to be: 

e € IP* 
e = ifo ~ <P 2 

Lemma 63. The profile e describes the processes profile as well as the resources profile 
of the elementary schedule s. That is: 


procp e = e 
resp e = e 

Proof. Schedule £ starts with a preparation step from time 0 to time 1, that is, ipo — <pi- 
This is followed by a reversal step from time 1 to time 2, that is, (pi — (f 2 - Their sum is 
(<A) - Ti) + (<Pi - Pi) = To - T 2 = e. □ 
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Lemma 64 (Profile composition). The composition of two profiles f, g G P is one of the 
following operations, depending on whether we deal with processes or resources profiles: 


[f, g] = f * 9 + To ~ T\a\ (processes) 

2 

(/, g) = f * 9 + 2(^0 — T\A ~ T\f\ (resources) 

That is, for all schedules S and T we have: 

procp ( S,T ) = [procp S , procp T] 
resp (S,T) = (resp S, resp T) 

Proof. To understand how this describes the composition of schedules, we must expand 
the arrangement operation f * g and reorder the summands: 

[/, g\ = (Vo - T\A + / » y) + 9 > I/I 

(/, 5 ) = (Vo - m + / » y) + (^° ~ v\f\ + g » I/I ^ 

These expressions correspond directly to the following natural language description of 
Definition 38: 

• For the processes profile [/, g] , we count the forward steps from time 0 to time ^ , 
followed by / starting at time Concurrently, g starts at time |/|. 

• The resources profile (/, 5 ) is almost identical, except that we also count the check- 
point holding from time 0 to time |/| before starting g at time |/|. 

□ 

Motivation. Since we are dealing with profiles of real schedules, we expect the composi- 
tion operations to preserve complete profiles. Also, we expect the total duration to be 
the sum of the individual durations. Just to be sure, we show both properties formally. 

Lemma 65. P* is closed under composition. That is, for all f,g € P*: 

[/,<?]€ P* 

(/,</)€ P* 

Proof. First, we apply Lemma 57 to 0 < ^ and 0 < |/|: 

To - V\g\ e P* 

2 

To ~ T\f\ € P* 
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According to Lemmas 56 and 60, we know that P* is closed under addition and arrange- 
ment. Hence, 

[f,g] = f*9+ 

(f,g) = f*g + (<po-fm) + (^o-^i/i) eP* □ 

Lemma 66. For all f,g € P*: 

|[/,5]| = I(/,5)I = |/| + H (37) 

Proof. This is a direct application of (24) and (35): 

\[f,g}\ = max (\f*g\ ,0, = max ^|/| + |g| ,0, = |/l + M 

|(/,5)| = max (j/ *g\ ,0, y, |/|^ = max ^|/| + \g\ ,0, y, \f\^j = I/I + M □ 

Lemma 67 (Profiles of small schedules). The profile of e is: 

e = <p 0 - <P 2 
\e\ = 2 

The profile of (e, e) is: 

[e, e] = <p Q + <p 2 - F3 - F4 
(e, e) = 2<^o - <P3 - F4 

T/ie profile of ((s, e), e) is: 

[[e, e], e] = + <P3 - F5 - Fe 

((e, e), e) = 2^ 0 + Fi - F4 - F5 - Fe 

|- - -| =6 

T/ie profile of (e, (e, e)) is: 

[e, [e, e]] = (po + <fi 2 - F5 - Fe 
(e, (e, e)) = 2^ 0 + <£2 - F4 - F5 ~ Fe 
|- - -| =6 

Remark. These schedules are depicted in Figure 6. Moreover, Figure 7 shows the re- 
sources profile of (e, (e, e)). 
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Proof. This is just an exercise in expanding the terms by applying the definitions: 


|e| = \p 0 - p 2 \ 

= 2 

[e, e\ = e * e + po — pi 

= (e » 1) + (e » 2) + <£>o — Pi 
= ((pi - p 3 ) + (ip 2 - pa) + po~pi 
= Pb + pi - P3 - PA 
(e, e) = e * e + 2<^ 0 - Pi - P 2 

= (e > 1) + (e > 2) + 2p 0 — pi — p 2 
= (pi - P3 ) + (P2 - Pa) + 2v?0 - Pi - P2 
= 2(^o — P3 — Pa 
[[e,e],e] = [e,e]*e + p 0 - pi 

= ([e, e] > 1) + (e > 4) + p Q - p x 
= {pi + P3 ~ Pa~ Pb) + {.Pa ~~ Pb) + Po ~~ Pi 
= Po + P3 ~ Pb - Pb 
((e, e), e) = (e, e) * e + 2p 0 - pi - p 4 

= ((e, e) > 1) + (e > 4) + 2p 0 - pi - p 4 
= (2pi - pa - Pb) + {pa ~ Pb) + 2po ~ Pi ~ PA 
= 2(^o + Pi ~ Pa ~ Pb ~ Pb 
[e, [e, e}] = e* [e, e] + p 0 - p 2 

= (e > 2) + ([e, e] > 2) + <^ 0 - p 2 
= (<^2 - Pa) + {P2 T Pa Pb Pb) + Po - P2 
= Po + P 2 ~ Pb - Pb 
(e, (e, e)) = e * (e, e) + 2p 0 -P 2 ~P 2 

= (e > 2) + ((e, e) > 2) + 2p Q - p 2 - P 2 

= {P 2 - Pa) + {2p2 - Pb~ Pb) + 2po -P 2 -P 2 

= 2(^0 + <^2 — <^4 - Pb — Pb □ 

Motivation. We clearly see that our new multiplications as of Lemma 64 are not asso- 
ciative, as associativity fails to hold for simples examples: 

[e, [e,e]] / [[e, e] , e] 

(e, (e, e)) / ((e,e),e) 

This is no surprise, as these decribe the profiles of (e, (e,e)) respectively {(s,e),e), which 
are different schedules with different resource usage. 

So we follow the usual approach when associativity fails: We calculate the associator 
for each type of multiplication. Note that although we borrow that term from the theory 
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of nonassociative algebras [Sch66, p. 13], there are important differences to keep in mind: 

1. Our multiplications as of Lemma 64 are not bilinear, so their associators are not 
linear in each argument, either. Hence, the classic results about nonassociative 
algebras are not applicable here. [Sch66, p. 1] 

2. We calculate [/, [ g , h]] — [[/, g], h] instead of [[/, g\, h] — [/, \g, h]] to make the right- 
hand sides easier to understand. That is, our associators have the reverse sign. 

Lemma 68 (Associators). For all complete profiles f,g : h,€ P*; 

If, [ 9 , h]] ~ [[/, g],h] = ip\ f \ - F\ f \ + M (38) 

(/> ( 9 ,h)) - ((/, g),h) = <p\ f \ - <^ : j| (39) 

Proof. We expand the first term using Lemma 64, (37), (34), (36), (15) and (16). The 
remaining terms expand in almost the same way. 

[/, [9, h}\ = f*[g, h] +<p 0 - <P \\gM\ 

2 

= / * [9 * h + ip 0 - ip\h\ ) +<p G - v \ g \+\h\ 

\ 2 / 2 

= / * 9 * h + ( (fo - ip\h\ ) > |/| + Po - P \g\+\h\ 

\ 2 / 2 

= f * g * h + ip\f \ — + Po — P \g\+\h\ 

= f*g*h + ip 0 - ip \ g \+\h\ + <p\f\ - ^|y,| + ihi 

[[f,g\,h] = f*g*h + <p 0 - y jgi+iM 

2 

Subtracting both equations proves (38). Further, 

(/> (9,h)) = f * g * h + 2p 0 - ip jgj+jfej - - ( P\f\+\g\ + P\f\ 

((/, g),h) = f * g * h + 2ip 0 - W \ g \+\h\ - + M _ P\f\+\g\ + 

Subtracting both equations proves (39). □ 

Remark. In this proof, we see clearly how the introduction of the associative arrangement 
operator payed off, as promised in Section 3.4. 

Motivation. Our associators have a surprisingly simple form. Both are independent of g. 
Moreover, only the durations of / and h matter, not their exact shape. This gives rise 
to the main theorem in this section: 
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Theorem 69 (Associator inequalities). For all complete profiles f,g,h£ P* with \h\ > 0: 


[/, [5,^]] > [[f,9\,h] 


(40) 

(/, (3,^)) > ((f,g),h) 

for I/I < ^ 

(41) 

(/, (ssM) < ((f,g),h) 

for I/I > y 

(42) 


Remark. This theorem helps us to decide how to combine three schedules. Should we 
combine them as (/, ( g , h )) or as ((/, g), h)l 

• If |/| < we should always use ((/, g), h), because that one uses fewer processes 
and not more resources. This is demonstrated in Figure 9. 

• If |/| > we should consider both variants, because (( f,g),h ) uses fewer pro- 
cesses, but (/, ( g,h )) uses fewer resources. 



Figure 9.: Associator inequality. 


Proof. Applying (11) to |/| < |/| + ^ leads to: 

m > p ]f]+ m 

Moreover, since \h\ > 0 we know that ip m / <A f i , n- Hence, 

l/M 2 

^|/| > 

In the same way, we conclude: 

<P\f\ > Pm for |/| < ^ 

2 Z 

f Ifl 1^1 

P\f\ < PM for I/I > ~2 
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Finally, we apply Lemma 68 to prove (40) to (42): 

[/> \9,h]] - [[f,g],h] = <p\ f \ - <P\ f \ + M > 0 
(/, (9,h)) - ((f,g),h) = ip\ f \ - ip\h\ >0 for |/| < ^ 

(/> (ff.^)) - = <P\f\ ~ ¥>M <0 for |/| > -y □ 


54 




4. Optimal and Suboptimal Schedules 


In this chapter we will develop a double sequence S k of schedules, each of which will use 
no more than p processes and k resources. Ideally, every S ' p would be optimal in the 
sense that it achieves the maximum reach £, given p and k with p > 1 and k > p. That 
is, we would like to have: 


e(s p k ) = £ max ( P ,k) 

At present, however, only certain classes of optima are known. For the remaining classes 
suboptimal schedules are provided. Those schedules will provide a lower bound on the 
maximum reach: 


t(S>) <£ max (p,k) 

All schedules S p are implemented in the tool (Appendix A). 

4.1. Fibonacci Schedules 

Definition 70 (Fibonacci schedules). We define S k to be the optimal schedules for 
p = k developed in [Wal99] : 


S} := e 
Si := (e,e) 

fort >3 

Example 71. Some of these schedules are shown in Figure 10. 

Remark. We refer to these schedules as Fibonacci schedules , because they exhibit the 
same recursive structure as the Fibonacci numbers, and because their reaches i are 
exactly the Fibonacci numbers (shifted by one position): 

l(S\)= £ max (k,k) = F k+1 

As explained in Section 1.3, these schedules are also optimal for p > , but we will 

not necessarily define S k 2 ,S k 2 , . . . , S^ -1 to be S k . Instead, we will define those 

S p separately in the following sections, which may lead to different schedules that fit 
better into the systematics of the sequence for that respective p. 
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Sj = (0,e),0, (e,e))) 

procp S| = <¥o + <¥4 + <¥5 - <¥7 ~ <¥9 ~ ¥>io 
resp Sj = 2cp 0 + ¥>3 + ¥>4 - ¥>7 - - (p 9 - ¥>10 

£(Sf) = 5 



Si = (( £,(£,£)), ((£,£),(£,(£,£ )))) 

procp S'! = ¥>0 + ¥>6 + ¥>7 - ¥>13 ~ ¥>15 ~ ¥>16 
resp Si = 2ipo + ¥>5 + ¥>6 + ¥>7 - ¥>n - ¥>13 - ¥>m - ¥>15 ~ ¥>16 
£(Sl) = 8 



Si = (((e, e), (e, (e, e))), ((e, (e, e)), ((e, e), (e, (e, e))))) 

procp Si = (fo + ¥>10 + ¥>12 + ¥>13 - ¥>15 + ¥>16 - ¥>18 - <¥23 “ ¥>25 ~ ¥>26 
resp Sf = 2<^o + ¥>8 + ¥>io + ¥>n + ¥>12 - ¥>18 - ¥>21 - <¥23 - ¥>24 - ¥>25 - ¥>26 
£(,S 6 6 ) = 13 


Figure 10.: Optimal schedules for p = k. These Fibonacci schedules are also optimal for 
<p<k. 
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4.2. Exhaustive Search 


An exhaustive search has been implemented to find optimal schedules for small val- 
ues p and k. This search has been implemented in OCarnl, more to that in Ap- 
pendix A. One particularly interesting schedule discovered by the search is presented 
in Appendix D. 

The search result is summarized in Figure 11, which lists the maximum reach 7 max (p, k ) 
of the found schedules. Unknown values that could not be computed within an accept- 
able amount of time are marked as Rows and columns with known continuation 
are marked as “• • • ” and will be explained in the following sections. Columns whose 
continuation is conjectured are marked as “(• • •)”. Values which are inapplicable because 
of p > k are marked as “ — ”. Values which correspond to the already known Fibonacci 
schedules (p > ) are written in parentheses. 

To increase the range of the exhaustive search, it has been reformulated via integer 
linear programming and also via Petri nets. [Chel4, p. 9] Interval Petri nets [PZ13] 
have been considered, too, but were discarded because they are even more computa- 
tionally expensive to analyze, and because classic Petri nets are already sufficient to 
describe parallel reversal schedules. Unfortunately, none of these approaches provided 
any improvement over the directly implemented search. 

Moreover, the need for an exhaustive search diminished, given the schedules presented 
in the following sections. 


^ma x(jP) &) 

p = 1 

p = 2 

p = 3 

p = 4 

p = 5 

p = 6 p > 7 

k = l 

(1) 

— 

— 

— 

— 

— — 

k = 2 

1 

(2) 

— 

— 

— 

— — 

k = 3 

1 

(3) 

(3) 

— 

— 

— — 

k = 4 

1 

4 

(5) 

(5) 

— 

— — 

k = 5 

1 

5 

(8) 

(8) 

(8) 

— — 

k = 6 

1 

6 

12 

(13) 

(13) 

(13) 

k = 7 

1 

7 

17 

(21) 

(21) 

(21) ••• 

k = 8 

1 

8 

23 

32 

(34) 

(34) • • • 

k = 9 

1 

9 

30 

47 

(55) 

(55) ••• 

k = 10 

1 

10 

38 

67 

84 

(89) • • • 

k = 11 

1 

11 

47 

? 

? 

(144) ••• 

k = 12 
k > 13 

1 

12 

57 

(...) 

? 

? 

? 


Figure 11.: Maximum reach of small schedules. Notation: “?” = unknown, 

“• ■ • ” = continuation is known, “(•••)” = continuation is conjectured, 
“ — ” = inapplicable, “( F n )” = corresponds to a Fibonacci schedule. 
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4.3. One Process 

Definition 72. For p = 1, we define: 

Si := e for all k > 2 

Note that = s is already defined in Section 4.1. 

Example 73. For completeness, this simple schedule is depicted in Figure 12. 
Theorem 74. The schedules Si are optimal, that is: 

W l,k) = £(Sl) = 1 


The profiles are: 

procp Si = ipo — if 2 proc Si = 1 

resp Si = p o — P ‘1 res Sj. = 1 

Proof. = e is the only schedule that needs just one process, because all schedules for 
reversal of more than one physical step need to run a P and an R action in parallel. 
Since this is the only schedule for p = 1, it is the optimum. The profiles are given by 
Lemma 63. The maximum resources and processes follow directly from the profiles. □ 


S\ = s\ = ■■■ = Si = £ 


Figure 12.: Optimal schedules (all identical) for p = 1. 
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4.4. Two Processes 

Definition 75. For p = 2, we define: 

S 2 k '.= {Si_ u e) for k > 3 

Note that Sf = (e,e) is already defined in Section 4.1. 
Example 76. Some of these schedules are shown in Figure 13. 
Theorem 77. The schedules S k for k > 3 are optimal, that is: 

W(2 ,k) = l{S 2 k ) = k 


The profiles are: 

procp Si = (po + Tk ~ T 2 k-i - T 2 k proc Sf. = 2 

k—2 2k 

l-esp s‘l = 2(p 0 + 'Y^ J Ti- ^2 Vi res 5 t = k 

i=l i=k-\-l 

Proof. The profiles are calculated via Lemma 64. The reach £(Sl) = k follows directly 
from the profile durations | procp <S^| = |resp5|| = 2k. The maximum resources and 
processes also follow directly from the profiles, as the maximal partial sum of the coef- 
ficients. 

We will show optimality by contradiction. Assume for some k that we found a schedule 
S' which needs at most k resources, but reverses i = k + 1 physical steps. During 
reversal, the two processes are needed for the parallel P and R actions, so there can’t be 
any F action during the second half of the schedule. This means that all physical steps 
k — 1, . . . , 1 must be given by C actions. In particular, during the first reversal we have: 

• one R action for the physical step k + 1, 

• one P action for the physical step k, and 

• one C action for each of the physical steps 1 , ... ,k — 1. 

In total, we have k + 1 actions in parallel, which contradicts our assumption that S' 
needs at most k resources. Hence, the S k are optimal. □ 
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Si = (((£,£),£),£) 

procp Si = (po + <P4 - V7 - V8 
resp sl = 2(f 0 + p\ + <p 2 - <P5 ~ <P6 ~ <P7 - <P8 
£{Sl) = 4 



Sl = ((((£, £),£),£),£) 

procp Sl = <po + <ps - (pg - <pw 
resp Sl = 2cp 0 + <pi + <£2 + P3 ~ <P6 ~ <P7 ~ <Ps ~ V9 ~ <£io 
£{Sl) = 5 



Sl = (((((e,e),e),e),e),e) 
procp Sl = ip 0 + tpQ - <^n - y?i2 

resp Sl = 2 ipo + <pi + <P2 + <P 3 + <Pa ~ <P 7 — <^8 — <^9 — <^10 — <^11 — <£12 
£(Sl) = 6 


Figure 13.: Optimal schedules for p = 2. 
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4.5. Three Processes 

Definition 78. For p = 3, we define: 

s!-.= (si, s 2 2 ) 

Sj) := (Sl^Sl,) for k > 5 

Example 79. Some of these schedules are shown in Figure 14. 

Theorem 80. The profiles of Si for k > 4 are: 

= 3 

= k 

i=l i=2£—k+l 


procp Sl = <p 0 + w + tp£ + i - <P 2 t-k+i - T2t-1 - +2 i proc S\ 

k—2 21 

resp S'i = 2(^0 + ^ ip mi ~ ^ <Pi res Sf 


where: 


Their reach is: 


1 = 2 + 


rrij = ki — 


(A; -2)0- 1) 
2 

i(i — 3) 


-y3\ n i (k ~ 2)(fc — 1) 


« = 2 + 


Proof. The profiles are calculated via Lemma 64. The reach, maximum resources and 
maximum processes follow directly from the profiles. □ 

Remark. Note that i and raj can also be written as: 


k—2 

< = 2 + £•>' 

3 = 1 
k—2 

m = :i 

j=k—i— 1 

which are close to the schedule structures as depicted in Figure 14. 

Motivation. The exhausive search in Figure 11 shows that all schedules S'? have exactly 
the optimal reach for 4 < k < 12. It is likely that these are optimal for all k. 

Conjecture 81. The schedules Sf are optimal, that is: 

4nax(3, k) = £(S%) = 2 + {k ~ 2) 2 (fc ~ 1} 
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S’! = (((e,e),e),(e,e)) 

procp Sf = (p 0 + ips + - (fig - V?10 

resp Sf = 2cp 0 + <p 2 + <^3 - <^7 - ¥>8 - <P9 ~ <£io 
£(Si) = 5 



£5 = ((((£,£),£),(£,£)),((£,£),£)) 
procp Sf = (p 0 + (ps + <P9 - V 12 - </?15 - <^16 
resp Si = 2ip 0 + ^3 + <P5 + <^6 - ^12 - ¥>13 - <£14 - <£15 - ¥>16 
^(£5) = § 



procp Si = (po + ¥>12 + <^13 - <Pl9 - ^23 - </?24 
resp sl = 2(po + (p 4 + <P 7 + <^9 + ¥>10 - ^19 - ^20 - ¥>21 - P22 ~ <£23 - <^24 
£(Sl) = 12 


Figure 14.: Optimal schedules for p = 3. 
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4.6. Four or more Processes 


For p = 1,2,3 we observed a constant, linear and quadratic pattern in the schedule 
reaches £(S P ) as well as in the schedules themselves. Unfortunately, this does not extend 
to p > 4. Here we do not observe schedules of cubic growth, they are smaller. More 
precisely, from the exhaustive search as of Figure 11 we know that the values of £ max (4, k ) 
for k = 4, 5, 6, 7, 8 are £ = 5, 8, 13, 21, 32. Since a cubic function is uniquely determined 
by 4 values, we know that the sequence 5, 8, 13, 21, . . . would have to continue with 33, 
but it continues with 32. In other words, for p > 4 the schedules do not reach as far as 
we might expect. 

Since the optimal schedules for p > 4 could not yet be found in a systematic way, 
we provide suboptimal schedules. These grow quadratic with k and cubic with p. To 
construct them, we start with defining the subschedules to be used. 

Definition 82 (Total recalculation schedules). For £ > 1, we define Tp to be the total 
recalculation schedules of reach £: 

T\ ■=£ 

T e :=(e,T & _ i) for i > 2 

Definition 83 (Piled total recalculation schedules). For p > 2 and k > p, we define Pj’ 
to be the composition of uniform total recalculation schedules to form a pile, with a 
smaller schedule T 2 on the top: 

pj; ■■= t 2 

P k : = ( P k- v r 2 p-s) for k > p + 1 
Lemma 84. For all schedules Pjj with p > 3 and k > p + 1: 

£{P%) = 2 + (k — p)(2p — 3) proc P{! = p res Pj. ° = k 

p— 2 p— 1 

PrOCp P% = (p 0 + ^ <Pm+2+2i + Vt ~ ^ T^+m+2 i ~ <P2 i 
i = 1 i= 1 

k—p—1 p— 3 

resp Ff = 2(p 0 + ^ <A( 2 P - 3 ) + ^m+ 2 + 2 i + Pi - 2 + Pi - 1 

i= 1 i=l 

k—p—1 p— 2 

~ X/ P l+i(2p—3) ~ ^ Pl+m+2i ~ P21-2 ~ P21- 1 — P21 
i=l i= 1 


where: 


£ = l(P p k ) 

m = £ — 2p + 1 
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Definition 85 (Suboptimal schedules). For p > 4, we define: 

^+1 - PT'.Ii p- 5 ) 

S t •= ( S t-V r Tl ) for fc > P + 2 

Example 86. Figure 15 demonstrates the construction of S'! and shows some of the 
used subschedules. 

Theorem 87. The schedules for p > 4 and k>p + l have the following reaches: 

UK) = ( (t ~ P)(fc 2 ~ P ~ 1> + 2 ) (2p - 5) + 2(t - „) 

For k > p + 2 t/iey have the following processes profile: 

p — 3 p— 2 

PIOCp — 9^0 “1“ ^ ^ V^7n+4+2z Pi P£-\-2p—5 Pn ^ ^ P t-\-m-\-2-\-2i P2t 
i = 1 z=l 

proc S£ = p 


where: 


t = t(P P k ) 

m = l — 2p + 1 
n = 21 - [2p - h){k - p) - 2 

Proof. The profile is calculated via Lemma 64 using Lemma 84. The reach and maximum 
processes follow directly from the profile. □ 

Conjecture 88. For k >p + 2 the suboptimal schedules all satisfy the resources require- 
ment: 


res = k 
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T 3 = (e, (£,£)) 

procp T 3 = (po + <P 2 - <P5 - P6 
l-esp T 3 = 2ip 0 + (P 2 - p>4 - <P 5 - <P6 



Pi = (((T 2 ,T 3 ),T 3 ),T 3 ) 

;p Pi = <Po + tpio + Pll - <^19 - 9^21 - ^22 

ip p| = 2(^0 + P3 + P6 + P9 + PlO - ¥?14 - <Pir - P19 - ^20 - ¥>21 - P>22 



5 8 4 = ((((p|,t 3 ),p 4 3 ),p 5 3 ),p|) 

procp Sg = V9 0 + <y9 3 i + ip 3 2 + (p 35 - 9950 - </>61 - <£63 - <£>64 
resp 5g = 2</9 0 + <£11 + v?i 9 + <£24 + <£27 + <£30 + <£31 


— <£50 — <£53 — <£56 — <£59 — <£61 — <P62 — <£63 — <£64 

*(Sf ) = 32 


Figure 15.: Construction of schedule 5|. 


4.7. Summary 


Figure 16 shows the reach of the optimal and suboptimal schedules defined so far. For 
comparison, for each new schedule the reach of the largest applicable Fibonacci schedule 
is shown in parentheses. For small k the Fibonacci schedules are usually better. For 
larger k the advantages of the new schedules become apparent. 

Comparing the reach of the suboptimal schedules (p > 4) with the optima calculated 
by exhaustive search (Figure 11), we see that S$ is indeed optimal. Moreover, Sg with 
reach 46 < 47 and S^q with reach 63 < 67 are close to optimal. However, ,Sf 0 with reach 
70 < 84 indicates that with larger p and k the gap will probably become larger. 
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Figure 16.: Reach 0. (Sf.) of all schedules defined so far. The numbers in parentheses 
show the reach of the largest applicable Fibonacci schedule. 
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5. Conclusion 


As expected, relaxing the central assumption of processor-checkpoint convertibility was 
very challenging, and while the research presented within this thesis cannot be a com- 
prehensive discussion on that topic, it does provide new insights into what the new 
schedules look like, as well as new approaches and tools to analyze them. 

Using a new and more compact representation of schedules, it was shown that check- 
point persistence still holds, while processor persistence and hence binary decomposition 
do not hold anymore. A comprehensive algebra on profile functions was developed, 
which provided a compact description of how the schedule profiles behave under various 
operations. This algebra enabled the analysis of associators of profiles, which in turn led 
to inequalities that ruled out a whole class of schedule compositions that are guaranteed 
to provide non-optimal schedules. The profile algebra turned out to be an appropriate 
instrument to analyze parallel reversal schedule via their profiles. New optimal schedules 
for a small numbers of processes were developed, to be applied in situations where the 
known Fibonacci schedules are not applicable. Additionally, suboptimal schedules were 
created where optimal schedules could not be found in a systematic way. 

Future research may try to find the remaining optimal schedules, or may try to improve 
the suboptimal schedules presented here. Also, more assumptions should be relaxed to 
make the schedules more realistic. Relaxing the processor-checkpoint convertibility is 
only one of many possibilities. For example, the assumption t = t = t = 1 could be 
relaxed, to take into account that P and R actions usually take more time than F actions, 
which has been done for the Fibonacci schedules but was outside the scope of the new 
schedules. In addition, all schedules so far ignore communication costs of transferring 
data within a parallel computer system, which is also an open issue for the Fibonacci 
schedules. 

The profile algebra may be useful in other research fields, as it is very generic at its 
core. It should be possible to apply this algebra to any sequence of events, where an 
event is considered to by anything that affects resource usage at a certain point in time. 
Moreover, the algebra could be generalized to handle multiple resources at once. For 
example, instead of having separate profiles for processes and resources, there could 
be combined profiles that describe processes and resources at once. This generalized 
algebra would be defined on functions / : R — >• R n rather than / : R — >• R, where a 

typical profile might look like this: ^0^0+ ^1^2 + ^-1^3. The partial order < 
would generalize in the usual way and almost all rules of Section 3.5 should still apply. 
Only the rules involving <pt would have to be rewritten using the new primitive functions 
B = {eupt | i € { 1 , . . . , n}, t € R>o}- 
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A. Tool for Generating Schedules 


Along with this thesis a program has been developed to support the research. Most of 
what is formally described here, such as the profile algebra, is also implemented within 
that tool. Almost all schedule pictures and profiles have been automatically generated 
by this tool. Also, the exhaustive search as of Section 4.2 is implemented within that 
tool. 

It is implemented in OCarnl [INR15], a programming language that provides very good 
support for high-level programming, while compiling down to efficient native code whose 
speed to comparable to C and C++. The correctness of the program is secured by the 
strict ML typesystem combined with a comprehensive set of unit tests. This allows for 
fast refactoring, which enabled the program to be developed simultaneously with the 
theory, influencing each other. 

The tool and related resources will be published at: 

https : //njh. eu/prs 

If the tool “prs” is called without any arguments, the list of commands is shown: 

Usage : 

prs s PROCESSES RESOURCES 
prs sp PROCESSES RESOURCES 
prs gen PROCESSES RESOURCES 
prs tree TREE 
prs tree_letters TREE 

prs tree_search MAX_PROCESSES MAX_RESOURCES MAX_REACH 

prs old_tree_search PROCESSES MAX_RESOURCES 

prs ascii < PRIME_SCHEDULE 

prs ascii_small < PRIME_SCHEDULE 

prs fibers FIBERS 

prs lp PROCESSES REACH 

prs search MAX_PROCESSES MAX_RESOURCES 

prs latex_tree TREE 

prs latex_gen PROCESSES RESOURCES 

prs latex_ps < PRIME_SCHEDULE 

prs test 

The command “prs test” runs all unit tests, “prs search” runs the exhaustive search, 
“prs gen” generates the optimal respectively suboptimal schedules as defined in Chap- 
ter 4. The commands “prs s” and “prs sp” also generate those schedules, but have 
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a less verbose output. Arbitrary binary-decomposable schedules can be analyzed with 

“prs tree”. 

For example, “prs gen 3 6” generates the optimal schedule for p = 3, k = 6, draws 
it in ASCII art, shows the tree structure (where e means e) and shows the profiles 
(where -1023 means — ^23) - Note that the ASCII art schedules are rotated by —90°. 


12 |\ 

1 2 | ,\ 

1 2 | . ,\ 

1 2 | . . A 

1 3 I ... A 

1 3 I ... I A 

1 3 I ... I . A 

1 4 I ... I .. A 

1 4 I ... I .. | A 

1 5 I . . . I . . I . A 

1 6 I . . . I . . I . I A 

1 6 | ... I .. | . | | A 

2 6 I ... I .. I - I A/ 

3 6 I ... I .. A A/ 

3 6 I ... A. I A/ 

3 6 I ... I A A/ 

3 6 A..IIA/ 

3 6 I A. I A/ 

3 6 IIAA/ 

2 5 IMA/ 

2 4 | | A/ 

2 3 | A/ 

2 2 A/ 

11 ./ 

3 6 12 

S = (((((e,e) ,e) , (e , e) ) , ((e,e) ,e)) , ( ( (e , e) ,e) ,e)) 
procp = +100, +1012, +1013, -1019 , -1023 , -1024 

resp = +200, +104, +107, +109, +1010, -1019, -1020, -1021, -1022, -1023, -1024 
reach = 12 
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B. Time Split in Schedule Algebra 


This appendix demonstrates how to determine what exactly happens within a schedule / 
at a certain point in time t, when / is decribed by the schedule algebra as of Chapter 2, To 
answer that question, we introduce a time split operation that splits / into a subschedule 
before t and a list of subschedules starting at t. 

Note that this approach is mentioned just for reference. The recommended way to 
deal with time is through schedule profiles as explained in Chapter 3. 

Definition 89 (Shortest task duration). The shortest task duration o(f) of a schedule 
/ E Z(j 4) is the duration of its shortest task: 1 

o(f) = 1° fOl/ = 0 

[nun{|u;i| , . . . , \w n \} for / = a\Wi H h a n w n ,Wi E A*, at E Z\ {0} 

Lemma 90 (Time split). For every schedule f E Z(^4) and time t E {1, . . . , o(/)} there 
exist tasks w\, . . . ,w n E A* and schedules gi, . . . , g n G Z(A) such that 

( gi\ 

f = Wig 1 H b tCnffn = (u>l • • • u> n ) • (43) 

\9nJ 

where |u>j| = t and the Wi are pairwise distinct and uniquely determined except for their 
order. 

Proof. From the preconditions it follows that / 0 , because for / = 0 there is no 

t E {1, . . . , o(0)} = 0. So we can represent / as: 

/ = ai«i + • • • + afcUfc with Uj E A*, aj E Z \ {0} 

For every Uj (j = 1, ... ,k) we have t < o(f) < \uj\, hence we can split Uj into a prefix pj 
of duration t and a suffix Sj that may be empty: 

Uj = pjSj with pj, Sj E A*, \pj\ = t 

Let T = {pi , . . . ,pk} be the set of prefixes and n = \T\. Let {rci, . . . ,w n } = T be a 
representation of T with pairwise distinct w t for i = 1 , ,n. Then every pj is equal 
to exactly one of the Wi. For each i = 1, . . . , n we define gi to contain all suffixes (with 


1 This is also known as the order [Coh85, p. 60], but that term would be confusing as soon as when we 
use it in its other meaning to describe the order within a sequence. 
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their coefficients) of tasks whose prefix is pf 


Si ~ Y1 a j s j 
j 

Pj=Wi 

This satisfies (43): 


'22 W iSi = Y, Y 


Clj Wi S j 


i = 1 i = 1 j 

Pj=Wi 


n n 

a jpj s j = E E %% 

i=i j i=i j 

Pj=Wi Pj=U>i 



k 

Y a i u i = f 

3 = 1 


The uniqueness follows from the fact that all Wi have the same duration, so these must 
be exactly the prefixes of the tasks of /, and those prefixes are uniquely determined 
except for their order. □ 

Example 91. We consider again the schedule of Example 12: 

/ = F 3 PR + FC 2 FPR + FC 4 PR + C 6 PR 


Splitting / at t = 3 yields: 


( PR \ 

f = (F 3 FC 2 C 3 ) FPR + C 2 PR 

\ c 3 pr J 



Remark. This lemma can be generalized to work for all t € {1,...,|/|} rather than 
t € {1, . . . , o(/)}, by introducing an additional summand that contains all tasks whose 
duration is small than t. 
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C. Polynomials as Profiles 


Motivation. The profile operations defined in Chapter 3 generalize various well-known 
operations on polynomials: 

1. The addition and scalar multiplication translate directly to polynomials. 

2. The shift operation 2> generalizes the multiplication with monomials X a ■ P. 

3. The duration operation |/| generalizes the degree of polynomials: degP. 

4. The final value /(|/|) generalizes the evaluation at X = 1, which is also known as 
the sum of coefficients. 

While the connection between polynomials and profile functions became obvious early 
in the work on this thesis, it is of no apparent use throughout. The main problems are: 

1. Polynomial multiplication doesn’t make sense on profiles. While it does make 
sense in the special cases of multiplication with scalars v ■ P and multiplication 
with monomials X a ■ P, the multiplication of any two profiles has no apparent 
meaning. 

2. The partial order <, an essential part of the algebra, doesn’t translate well to 
polynomials. For example, we might hope that nonnegativity could be handled 
by treating polynomials as functions. However, consider the following polynomial 
and its corresponding profile: 


P = X° -2X 1 + X 2 = (1- X) 2 

f = if o - 2(^i + 

While P is a square and thus P > 0 by any natural definition, we have /(f) = — 1 
for all t € [1,2) and hence / ^ 0. Of course, we can apply Lemma 47 directly 
to the coefficients of P, but then there disappears yet another advantage of using 
polynomials. 

Nevertheless, it seems worth mentioning this connection, so we will describe it in exact 
terms. 

Definition 92 (Polynomials as profiles). We define 'L to be the uniquely determined 
M- linear map between the polynomials M[X] and the profiles P that maps their base 
elements as follows: 

T : M[X] -> P 
T : X a i— > <p a (a G Z>o) 
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Remark. Here we treat M[X] as an ordinary M-linear space with basis (X° , X 1 , X 2 , . . . ). 
That is, we ignore the possibility of polynomial multiplication and care only about scalar 
multiplication. 


Lemma 93 (Polynomial operations on profiles). Let F,G € M[X],u € M , a E Z>o- 
Further, let 

f = *(F) 

9 = * (G) 

Then: 


f + g = *(F + G) (44) 

v f = m(v-F) (45) 

/ > a = T(X a • F) (46) 

1/1= deg F (47) 

/(I/I) = F{ 1) (48) 


Proof. (44) and (45) follow directly from the definition of T as an M-linear map. The 
other equalities are obvious for F = 0. It remains to prove (46) to (48) for F/0, which 
allows us to represent F as: 


n 

F = a iX l with a n / 0 
i = o 

Then, we apply T to both sides: 

n 

f = Y2 anpi with a n / 0 
i= 0 

Finally, we apply Lemma 44 at t = n with (t±, . . . , t, n+ \ ) = (0, . . . , n): 


fin) = ^2 a i 

i = 0 

With all that in mind, we can prove (46) to (48): 

n n / n 

/ » a = ]T am+a = a,T ( X i+a ) =f ^ a t X i+a 

i— 0 z=0 \z=0 

|/| = tt. = deg F 

n 

f(\f\) = f(n) = ^ ai = F(l) 

i= 0 


T(X a • F) 


□ 
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D. Notable Search Result for p=4, k=8 


The exhaustive search produced an interesting optimal schedule for p = 4 and k = 8, 
which is shown in Figure 17. Here, processor persistence is violated and cannot be fixed 
locally. It is impossible to replace the violating C action with an F action, because 
at that time the full number of processes is running. It is only possible to fix this by 
replacing another F action at that time with a C action. But that introduces a violation 
at that new place, as demonstrated in Figure 18. Moving it further down even leads to 
two violations, as seen in Figure 19. Moreover, this second new schedule now uses k = 9 
resources, hence it is no longer an optimal schedule. This example demonstrates that in 
general it is not possible to deal with violations of processor persistence through local 
modifications. 

Note, however, that we know that this particular schedule can be fixed globally. 
In Section 4.6 we constructed the binary-composable (and therefore processor-persistent) 
schedule S|, which has the same optimal reach i = 32. 
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